2015S

Fountain*, Crain

2015F

2015F1

[2017S1]

Find the best model for predicting Y based on X1 and X2. Y is the amount of profit that a company makes in a month. X1 is the number of months that the company has been in business. X2 is the amount spent on advertising.

Consider as predictors all possible linear and quadratic terms (\(X1, X1^2, X2, X2^2\), and \(X1X2\)). Consider possible transformations of Y. Include all appropriate diagnostics. When you have found your “best” model, predict a new Y when X1 = 20 and X2 = $1,500, giving a 95% prediction interval. The data set, shown below, appears in “Profits.xlsx”.

2015F2

A replicated fractional factorial design is used to investigate the effect of five factors on the free height of leaf springs used in an automotive application. The factors are (A) furnace temperature, (B) heating time, (C) transfer time, (D) hold down time, and (E) quench oil temperature. There are 3 observations at each setting.

Write out the alias structure for this design. What is the resolution of this design? Analyze the data. What factors influence the mean free height? The data set appears in the file “Springs.xlsx”.

2016S

Fountain, Tableman*

2016S1

2017F1

Find the best model for predicting Y (weight) based on X1 (age), X2 (height), and X3 (indicator for male). Consider as predictors all possible linear and quadratic terms. Consider possible transformations of Y. Include all appropriate diagnostics. When you have found your “best” model, predict a new Y when X1 = 26, X2 = 70, and X3 = 1, giving a 95% prediction interval. The data set, shown below, appears in “RegressionSpr16.xlsx”.

table_2016s1 <- readxl::read_xlsx("qe_lab/RegressionSpr16.xlsx")[-1,]
table_2016s1$weight <- round(as.numeric(table_2016s1$weight), 2)
table_2016s1$age <- as.factor(table_2016s1$age)
table_2016s1$height <- round(as.numeric(table_2016s1$height), 2)
table_2016s1$male <- factor(table_2016s1$male, labels=c("female","male"))
str(table_2016s1)
## Classes 'tbl_df', 'tbl' and 'data.frame':    30 obs. of  4 variables:
##  $ weight: num  250 110 243 118 249 ...
##  $ age   : Factor w/ 6 levels "20","21","22",..: 1 1 1 1 1 2 2 2 2 2 ...
##  $ height: num  71 67.2 68.1 67.7 68.6 65.2 67.6 67.4 67.5 69.4 ...
##  $ male  : Factor w/ 2 levels "female","male": 2 1 2 1 2 1 2 1 1 2 ...
library(ggpubr)
ggline(table_2016s1,"height","weight",add = c("mean","jitter"),color = "age" )
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced
ggline(table_2016s1,"height","weight",add = c("mean","jitter"),color = "male",shape = "male" )
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

model_2016s1 <- lm(weight~height*male*age, table_2016s1)
olsrr::ols_step_both_aic(model_2016s1)
model_2016s1_1 <- lm((weight)~(height):male:age, table_2016s1)
summary(model_2016s1_1)
## 
## Call:
## lm(formula = (weight) ~ (height):male:age, data = table_2016s1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.325  -7.574   0.820   7.236  45.739 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)              -16.955    304.536  -0.056    0.956
## height:malefemale:age20    1.942      4.524   0.429    0.673
## height:malemale:age20      3.818      4.403   0.867    0.398
## height:malefemale:age21    1.928      4.570   0.422    0.678
## height:malemale:age21      3.318      4.454   0.745    0.466
## height:malefemale:age22    1.916      4.603   0.416    0.682
## height:malemale:age22      4.253      4.333   0.981    0.340
## height:malefemale:age23    2.077      4.565   0.455    0.655
## height:malemale:age23      4.458      4.430   1.006    0.328
## height:malefemale:age24    2.012      4.621   0.435    0.669
## height:malemale:age24      4.102      4.383   0.936    0.362
## height:malefemale:age25    1.886      4.730   0.399    0.695
## height:malemale:age25      4.665      4.376   1.066    0.301
## 
## Residual standard error: 26.62 on 17 degrees of freedom
## Multiple R-squared:  0.9399, Adjusted R-squared:  0.8976 
## F-statistic: 22.17 on 12 and 17 DF,  p-value: 5.073e-08
model_2016s1_2 <- lm(log(weight)~male:age+height:male:age, table_2016s1)
summary(model_2016s1_2)
## 
## Call:
## lm(formula = log(weight) ~ male:age + height:male:age, data = table_2016s1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.045786 -0.003582  0.000000  0.000944  0.062574 
## 
## Coefficients: (2 not defined because of singularities)
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               6.992621   0.753588   9.279 3.50e-05 ***
## malefemale:age20        -10.754395   6.240000  -1.723 0.128468    
## malemale:age20           -1.957457   1.272603  -1.538 0.167900    
## malefemale:age21         -8.277268   1.398515  -5.919 0.000588 ***
## malemale:age21           10.295232   1.903129   5.410 0.000998 ***
## malefemale:age22         -8.773831   1.521712  -5.766 0.000687 ***
## malemale:age22          -12.248065   1.543569  -7.935 9.60e-05 ***
## malefemale:age23         -8.304076   1.748874  -4.748 0.002088 ** 
## malemale:age23           -1.320501   0.754287  -1.751 0.123474    
## malefemale:age24         -8.932178   1.198331  -7.454 0.000143 ***
## malemale:age24            4.900144   1.530232   3.202 0.015019 *  
## malefemale:age25        -10.180009   2.245823  -4.533 0.002690 ** 
## malemale:age25                  NA         NA      NA       NA    
## malefemale:age20:height   0.125984   0.091835   1.372 0.212458    
## malemale:age20:height     0.006874   0.014810   0.464 0.656606    
## malefemale:age21:height   0.089870   0.017661   5.089 0.001417 ** 
## malemale:age21:height    -0.174439   0.025510  -6.838 0.000245 ***
## malefemale:age22:height   0.097775   0.019958   4.899 0.001755 ** 
## malemale:age22:height     0.154531   0.019132   8.077 8.57e-05 ***
## malefemale:age23:height   0.091513   0.023633   3.872 0.006114 ** 
## malemale:age23:height           NA         NA      NA       NA    
## malefemale:age24:height   0.101254   0.014121   7.170 0.000182 ***
## malemale:age24:height    -0.090567   0.019132  -4.734 0.002123 ** 
## malefemale:age25:height   0.121461   0.032798   3.703 0.007622 ** 
## malemale:age25:height    -0.018138   0.010819  -1.677 0.137537    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03247 on 7 degrees of freedom
## Multiple R-squared:  0.9987, Adjusted R-squared:  0.9948 
## F-statistic: 251.1 on 22 and 7 DF,  p-value: 3.905e-08
anova(model_2016s1_2)
## Analysis of Variance Table
## 
## Response: log(weight)
##                 Df Sum Sq Mean Sq F value    Pr(>F)    
## male:age        11 5.5405 0.50368 477.776 5.658e-09 ***
## male:age:height 11 0.2839 0.02581  24.485 0.0001567 ***
## Residuals        7 0.0074 0.00105                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model_2016s1_2)
## Warning: not plotting observations with leverage one:
##   2, 4, 7, 10, 13, 15, 19, 23, 24, 27, 29

## Warning: not plotting observations with leverage one:
##   2, 4, 7, 10, 13, 15, 19, 23, 24, 27, 29
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

2016S2

2017F2

A process engineer is testing the yield of a product manufactured on three specific machines. Each machine can be operated at fixed high and low power settings, although the actual settings differ from one machine to the next. Furthermore, a machine has three stations on which the product is formed, and these are the same for each machine. An experiment is conducted in which each machine is tested at both power settings, and three observations on yield are taken from each station. The runs are made in random order. Analyze this experiment. The data set, shown below, appears in “DesignSpr16.xlsx”.

DesignSpr16 <- readxl::read_excel("qe_lab/DesignSpr16.xlsx")
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * … and 4 more problems
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble  2.1.3     ✔ purrr   0.3.2
## ✔ tidyr   0.8.3     ✔ dplyr   0.8.3
## ✔ readr   1.3.1     ✔ stringr 1.4.0
## ✔ tibble  2.1.3     ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract()   masks magrittr::extract()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ purrr::set_names() masks magrittr::set_names()
table_2016s2 <- gather(DesignSpr16[c(2:4,6:8),c(2:4,6:8,10:12)])
names(table_2016s2) <- c("machine","y")
table_2016s2 <- table_2016s2[c("y","machine")]
table_2016s2$machine <- as.factor(c(rep("machine1",18),rep("machine2",18),rep("machine3",18)) )
table_2016s2$station <- as.factor(rep(c(rep("station1",6),rep("station2",6),rep("station3",6)),3) )
table_2016s2$power <- as.factor(rep(c(rep("power1",3),rep("power2",3)),9) )
str(table_2016s2)
## Classes 'tbl_df', 'tbl' and 'data.frame':    54 obs. of  4 variables:
##  $ y      : num  34.1 30.3 31.6 24.3 26.3 27.1 33.7 34.9 35 28.1 ...
##  $ machine: Factor w/ 3 levels "machine1","machine2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ station: Factor w/ 3 levels "station1","station2",..: 1 1 1 1 1 1 2 2 2 2 ...
##  $ power  : Factor w/ 2 levels "power1","power2": 1 1 1 2 2 2 1 1 1 2 ...
library(ggpubr)
ggline(table_2016s2,"machine","y",add = c("mean","jitter"),color = "station",shape = "station")
ggline(table_2016s2,"machine","y",add = c("mean","jitter"),color = "power",shape = "power")

model_2016s2 <- lm(y~machine*station*power, table_2016s2)
summary(model_2016s2)
## 
## Call:
## lm(formula = y ~ machine * station * power, data = table_2016s2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0000 -0.6500  0.1000  0.7583  2.2000 
## 
## Coefficients:
##                                             Estimate Std. Error t value
## (Intercept)                                  32.0000     0.7562  42.316
## machinemachine2                               0.8667     1.0694   0.810
## machinemachine3                               1.0000     1.0694   0.935
## stationstation2                               2.5333     1.0694   2.369
## stationstation3                               4.7000     1.0694   4.395
## powerpower2                                  -6.1000     1.0694  -5.704
## machinemachine2:stationstation2              -1.5000     1.5124  -0.992
## machinemachine3:stationstation2              -2.2000     1.5124  -1.455
## machinemachine2:stationstation3              -3.5000     1.5124  -2.314
## machinemachine3:stationstation3              -5.0000     1.5124  -3.306
## machinemachine2:powerpower2                  -1.6333     1.5124  -1.080
## machinemachine3:powerpower2                  -1.7000     1.5124  -1.124
## stationstation2:powerpower2                   0.2333     1.5124   0.154
## stationstation3:powerpower2                  -5.0333     1.5124  -3.328
## machinemachine2:stationstation2:powerpower2  -0.7000     2.1389  -0.327
## machinemachine3:stationstation2:powerpower2   0.4333     2.1389   0.203
## machinemachine2:stationstation3:powerpower2   4.3667     2.1389   2.042
## machinemachine3:stationstation3:powerpower2   3.9667     2.1389   1.855
##                                             Pr(>|t|)    
## (Intercept)                                  < 2e-16 ***
## machinemachine2                              0.42304    
## machinemachine3                              0.35598    
## stationstation2                              0.02333 *  
## stationstation3                             9.38e-05 ***
## powerpower2                                 1.73e-06 ***
## machinemachine2:stationstation2              0.32792    
## machinemachine3:stationstation2              0.15444    
## machinemachine2:stationstation3              0.02648 *  
## machinemachine3:stationstation3              0.00215 ** 
## machinemachine2:powerpower2                  0.28735    
## machinemachine3:powerpower2                  0.26844    
## stationstation2:powerpower2                  0.87825    
## stationstation3:powerpower2                  0.00203 ** 
## machinemachine2:stationstation2:powerpower2  0.74536    
## machinemachine3:stationstation2:powerpower2  0.84059    
## machinemachine2:stationstation3:powerpower2  0.04858 *  
## machinemachine3:stationstation3:powerpower2  0.07187 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.31 on 36 degrees of freedom
## Multiple R-squared:  0.9377, Adjusted R-squared:  0.9083 
## F-statistic:  31.9 on 17 and 36 DF,  p-value: < 2.2e-16
anova(model_2016s2)
## Analysis of Variance Table
## 
## Response: y
##                       Df Sum Sq Mean Sq  F value    Pr(>F)    
## machine                2  21.44   10.72   6.2475  0.004687 ** 
## station                2  16.98    8.49   4.9489  0.012623 *  
## power                  1 845.70  845.70 492.9587 < 2.2e-16 ***
## machine:station        4  16.60    4.15   2.4195  0.066255 .  
## machine:power          2   0.38    0.19   0.1115  0.894793    
## station:power          2  16.30    8.15   4.7514  0.014749 *  
## machine:station:power  4  12.91    3.23   1.8806  0.135072    
## Residuals             36  61.76    1.72                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_2016s2_1 <- lm(y~power:machine:station, table_2016s2)
summary(model_2016s2_1)
## 
## Call:
## lm(formula = y ~ power:machine:station, data = table_2016s2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0000 -0.6500  0.1000  0.7583  2.2000 
## 
## Coefficients: (1 not defined because of singularities)
##                                             Estimate Std. Error t value
## (Intercept)                                  23.8333     0.7562  31.517
## powerpower1:machinemachine1:stationstation1   8.1667     1.0694   7.636
## powerpower2:machinemachine1:stationstation1   2.0667     1.0694   1.932
## powerpower1:machinemachine2:stationstation1   9.0333     1.0694   8.447
## powerpower2:machinemachine2:stationstation1   1.3000     1.0694   1.216
## powerpower1:machinemachine3:stationstation1   9.1667     1.0694   8.571
## powerpower2:machinemachine3:stationstation1   1.3667     1.0694   1.278
## powerpower1:machinemachine1:stationstation2  10.7000     1.0694  10.005
## powerpower2:machinemachine1:stationstation2   4.8333     1.0694   4.519
## powerpower1:machinemachine2:stationstation2  10.0667     1.0694   9.413
## powerpower2:machinemachine2:stationstation2   1.8667     1.0694   1.745
## powerpower1:machinemachine3:stationstation2   9.5000     1.0694   8.883
## powerpower2:machinemachine3:stationstation2   2.3667     1.0694   2.213
## powerpower1:machinemachine1:stationstation3  12.8667     1.0694  12.031
## powerpower2:machinemachine1:stationstation3   1.7333     1.0694   1.621
## powerpower1:machinemachine2:stationstation3  10.2333     1.0694   9.569
## powerpower2:machinemachine2:stationstation3   1.8333     1.0694   1.714
## powerpower1:machinemachine3:stationstation3   8.8667     1.0694   8.291
## powerpower2:machinemachine3:stationstation3       NA         NA      NA
##                                             Pr(>|t|)    
## (Intercept)                                  < 2e-16 ***
## powerpower1:machinemachine1:stationstation1 4.89e-09 ***
## powerpower2:machinemachine1:stationstation1   0.0612 .  
## powerpower1:machinemachine2:stationstation1 4.60e-10 ***
## powerpower2:machinemachine2:stationstation1   0.2321    
## powerpower1:machinemachine3:stationstation1 3.22e-10 ***
## powerpower2:machinemachine3:stationstation1   0.2095    
## powerpower1:machinemachine1:stationstation2 6.13e-12 ***
## powerpower2:machinemachine1:stationstation2 6.46e-05 ***
## powerpower1:machinemachine2:stationstation2 3.05e-11 ***
## powerpower2:machinemachine2:stationstation2   0.0894 .  
## powerpower1:machinemachine3:stationstation2 1.33e-10 ***
## powerpower2:machinemachine3:stationstation2   0.0333 *  
## powerpower1:machinemachine1:stationstation3 3.57e-14 ***
## powerpower2:machinemachine1:stationstation3   0.1138    
## powerpower1:machinemachine2:stationstation3 1.99e-11 ***
## powerpower2:machinemachine2:stationstation3   0.0951 .  
## powerpower1:machinemachine3:stationstation3 7.21e-10 ***
## powerpower2:machinemachine3:stationstation3       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.31 on 36 degrees of freedom
## Multiple R-squared:  0.9377, Adjusted R-squared:  0.9083 
## F-statistic:  31.9 on 17 and 36 DF,  p-value: < 2.2e-16
anova(model_2016s2_1)
## Analysis of Variance Table
## 
## Response: y
##                       Df Sum Sq Mean Sq F value    Pr(>F)    
## power:machine:station 17 930.31  54.724  31.899 < 2.2e-16 ***
## Residuals             36  61.76   1.716                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model_2016s2_1)

model_2016s2_2 <- lm(y~machine+power+station, table_2016s2)
summary(model_2016s2_2)
## 
## Call:
## lm(formula = y ~ machine + power + station, data = table_2016s2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5148 -0.9218  0.1176  1.1222  2.5463 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      33.8148     0.4999  67.644  < 2e-16 ***
## machinemachine2  -1.0056     0.4999  -2.012  0.04990 *  
## machinemachine3  -1.5167     0.4999  -3.034  0.00389 ** 
## powerpower2      -7.9148     0.4082 -19.391  < 2e-16 ***
## stationstation2   1.3722     0.4999   2.745  0.00849 ** 
## stationstation3   0.7389     0.4999   1.478  0.14591    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.5 on 48 degrees of freedom
## Multiple R-squared:  0.8912, Adjusted R-squared:  0.8798 
## F-statistic: 78.62 on 5 and 48 DF,  p-value: < 2.2e-16
anova(model_2016s2_2)
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq  F value  Pr(>F)    
## machine    2  21.44   10.72   4.7656 0.01295 *  
## power      1 845.70  845.70 376.0282 < 2e-16 ***
## station    2  16.98    8.49   3.7750 0.03002 *  
## Residuals 48 107.95    2.25                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2016F

Jong Sung Kim*, Brad Crain

2016F1

A national insurance organization wanted to study the consumption pattern of cigarettes in all 50 states and the District of Columbia. Data were collected for 1960, 1970, and 1980, but we will focus here on 1970. Using data from 1970, the organization wanted to construct a regression equation that relates statewide cigarette consumption (on a per capita basis) to various socioeconomic and demographic variables, and to determine whether these variables were useful in predicting the consumption of cigarettes. The variables chosen for study are given below. Age, x1: Median age of a person living in the state

Education, x2: Percentage of people over 25 years of age in a state that had completed high school

Income, x3: Per capita personal income for a state (in dollars)

Perblack, x4: Percentage of blacks living in a state

Perfem, x5: Percentage of females living in a state

Price, x6: Average price of a pack of cigarettes in a state (in cents)

Scig, y: Number of packs of cigarettes sold in a state on a per capita basis.

The data on these variables are stored in 8 columns in the same order as listed above; a two-letter alphabetic code is given first, however. The data are saved as “cigcons.xlsx”

Perform a complete regression analysis on these data; including checking of model assumptions and attempting appropriate remedies, if needed. The main objective of the analysis is to find the smallest number of variables that describes the state sale of cigarettes meaningfully and adequately. You might want to consider among others partial regression plot, interaction terms, outliers and influential cases analysis, Box-Cox transformation, and explanation of your final model.

table_2016f1 <- readxl::read_xlsx("qe_lab/cigcons.xlsx")
table_2016f1$State <- as.factor(table_2016f1$State)
str(table_2016f1)
## Classes 'tbl_df', 'tbl' and 'data.frame':    51 obs. of  8 variables:
##  $ State    : Factor w/ 51 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10 ...
##  $ Age      : num  27 22.9 26.3 29.1 28.1 26.2 29.1 26.8 28.4 32.3 ...
##  $ Education: num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 55.2 52.6 ...
##  $ Income   : num  2948 4644 3655 2878 4493 ...
##  $ perblack : num  26.2 3 3 18.3 7 3 6 14.3 71.1 15.3 ...
##  $ perfem   : num  51.7 45.7 50.8 51.5 50.8 50.7 51.5 51.3 53.5 51.8 ...
##  $ price    : num  42.7 41.8 38.5 38.8 39.7 31.1 45.5 41.3 32.6 43.8 ...
##  $ Scig     : num  89.8 121.3 115.2 100.3 123 ...
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa
ggpairs(table_2016f1[,-1])

model_2016f1 <- lm(Scig~price*perfem*perblack*Income*Education*Age, table_2016f1)
ols_step_both_aic(model_2016f1)
ols_step_both_p(model_2016f1)
##       perfem:Income:Age perfem:Income:Age:price 
##                5.380033                5.380033
## 
## Call:
## lm(formula = Scig ~ perfem:Income:Age + price:perfem:Income:Age, 
##     data = table_2016f1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.743 -12.457  -4.995   3.835 129.705 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              4.067e+01  2.127e+01   1.912  0.06187 .  
## perfem:Income:Age        3.847e-05  8.784e-06   4.379 6.43e-05 ***
## perfem:Income:Age:price -6.053e-07  1.770e-07  -3.420  0.00129 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.36 on 48 degrees of freedom
## Multiple R-squared:  0.3013, Adjusted R-squared:  0.2721 
## F-statistic: 10.35 on 2 and 48 DF,  p-value: 0.0001835
## Analysis of Variance Table
## 
## Response: Scig
##                         Df Sum Sq Mean Sq F value   Pr(>F)   
## perfem:Income:Age        1   6735  6734.6  8.9961 0.004279 **
## perfem:Income:Age:price  1   8758  8757.6 11.6985 0.001286 **
## Residuals               48  35933   748.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##        log(price) perfem:Income:Age 
##          1.066268          1.066268
## 
## Call:
## lm(formula = log(Scig) ~ perfem:Income:Age + log(price), data = table_2016f1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43922 -0.07364 -0.02540  0.05006  0.70893 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        7.407e+00  8.503e-01   8.711 1.89e-11 ***
## log(price)        -8.993e-01  2.405e-01  -3.739 0.000493 ***
## perfem:Income:Age  1.197e-07  2.657e-08   4.506 4.24e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1859 on 48 degrees of freedom
## Multiple R-squared:  0.3651, Adjusted R-squared:  0.3386 
## F-statistic:  13.8 on 2 and 48 DF,  p-value: 1.843e-05
## Analysis of Variance Table
## 
## Response: log(Scig)
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## log(price)         1 0.25199 0.25199  7.2931  0.009534 ** 
## perfem:Income:Age  1 0.70156 0.70156 20.3043 4.236e-05 ***
## Residuals         48 1.65850 0.03455                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2016F2

An experiment is conducted to compare the water quality of three creeks in an area. Five water samples are selected from each creek. Each sample is divided into two parts, and the dissolved oxygen content is measured for each part. (Higher dissolved oxygen contents indicate higher water quality.) The results are given as follows:

Creek/Water Sample 1 2 3 4 5
1 5.2, 5.4 5.6, 5.7 5.4, 5.4 5.6, 5.5 5.8, 5.5
2 5.1, 5.3 5.1, 5.0 5.3, 5.2 5.0, 5.0 4.9, 5.1
3 5.9, 5.8 5.8, 5.8 5.7, 5.8 5.8, 5.9 5.9, 5.9
  1. Write down an appropriate model with assumptions (including normality).

Two-stage nested design

\(y=\mu+\tau_i+\beta_{j(i)}+\varepsilon_{k(ij)}\), \(i=1,2,3\);\(j=1,2,3,4,5\);\(k=1,2\)

  1. Find the ANOVA table for the data.

  2. Perform the F-test comparing the creeks using a .05 level.

  3. Perform a Tukey multiple comparison on the creeks using a .05 level.

## 'data.frame':    30 obs. of  4 variables:
##  $ creek : Factor w/ 3 levels "creek1","creek2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ oxygen: num  5.2 5.4 5.6 5.7 5.4 5.4 5.6 5.5 5.8 5.5 ...
##  $ sample: Factor w/ 5 levels "sample1","sample2",..: 1 1 2 2 3 3 4 4 5 5 ...
##  $ rep   : Factor w/ 2 levels "rep1","rep2": 1 2 1 2 1 2 1 2 1 2 ...

## 
## Call:
## lm(formula = oxygen ~ creek/sample, data = table_2016f2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -0.15  -0.05   0.00   0.05   0.15 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                5.300e+00  6.831e-02  77.584  < 2e-16 ***
## creekcreek2               -1.000e-01  9.661e-02  -1.035  0.31702    
## creekcreek3                5.500e-01  9.661e-02   5.693 4.26e-05 ***
## creekcreek1:samplesample2  3.500e-01  9.661e-02   3.623  0.00251 ** 
## creekcreek2:samplesample2 -1.500e-01  9.661e-02  -1.553  0.14135    
## creekcreek3:samplesample2 -5.000e-02  9.661e-02  -0.518  0.61232    
## creekcreek1:samplesample3  1.000e-01  9.661e-02   1.035  0.31702    
## creekcreek2:samplesample3  5.000e-02  9.661e-02   0.518  0.61232    
## creekcreek3:samplesample3 -1.000e-01  9.661e-02  -1.035  0.31702    
## creekcreek1:samplesample4  2.500e-01  9.661e-02   2.588  0.02060 *  
## creekcreek2:samplesample4 -2.000e-01  9.661e-02  -2.070  0.05611 .  
## creekcreek3:samplesample4 -2.764e-17  9.661e-02   0.000  1.00000    
## creekcreek1:samplesample5  3.500e-01  9.661e-02   3.623  0.00251 ** 
## creekcreek2:samplesample5 -2.000e-01  9.661e-02  -2.070  0.05611 .  
## creekcreek3:samplesample5  5.000e-02  9.661e-02   0.518  0.61232    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09661 on 15 degrees of freedom
## Multiple R-squared:  0.9555, Adjusted R-squared:  0.914 
## F-statistic: 23.02 on 14 and 15 DF,  p-value: 1.305e-07
## Analysis of Variance Table
## 
## Response: oxygen
##              Df Sum Sq Mean Sq  F value    Pr(>F)    
## creek         2  2.678 1.33900 143.4643 1.665e-10 ***
## creek:sample 12  0.330 0.02750   2.9464   0.02559 *  
## Residuals    15  0.140 0.00933                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##              GVIF Df GVIF^(1/(2*Df))
## creek          25  2        2.236068
## creek:sample   25 12        1.143530

## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
## 
##     expand
## Registered S3 methods overwritten by 'lme4':
##   method                          from
##   cooks.distance.influence.merMod car 
##   influence.merMod                car 
##   dfbeta.influence.merMod         car 
##   dfbetas.influence.merMod        car
## Linear mixed model fit by REML ['lmerMod']
## Formula: oxygen ~ creek + (1 | creek:sample)
##    Data: table_2016f2
## 
## REML criterion at convergence: -29.7
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.77284 -0.43208 -0.06556  0.52633  2.04448 
## 
## Random effects:
##  Groups       Name        Variance Std.Dev.
##  creek:sample (Intercept) 0.009083 0.09531 
##  Residual                 0.009333 0.09661 
## Number of obs: 30, groups:  creek:sample, 15
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  5.51000    0.05244 105.071
## creekcreek2 -0.41000    0.07416  -5.528
## creekcreek3  0.32000    0.07416   4.315
## 
## Correlation of Fixed Effects:
##             (Intr) crkcr2
## creekcreek2 -0.707       
## creekcreek3 -0.707  0.500
## Analysis of Variance Table
##       Df  Sum Sq Mean Sq F value
## creek  2 0.90889 0.45444  48.691
## [1] 1.743538e-06
## Computing profile confidence intervals ...
##                   2.5 %     97.5 %
## .sig01       0.00000000  0.1425879
## .sigma       0.07016639  0.1450729
## (Intercept)  5.41185731  5.6081427
## creekcreek2 -0.54879472 -0.2712053
## creekcreek3  0.18120528  0.4587947

## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## The following object is masked from 'package:dplyr':
## 
##     count
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## Analysis of Variance Table
## 
## Response: oxygen
##                  Df Sum Sq Mean Sq F value    Pr(>F)    
## creek_f           2  2.678 1.33900 48.6909 1.743e-06 ***
## creek_f:sample_r 12  0.330 0.02750  2.9464   0.02559 *  
## Residual         15  0.140 0.00933                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

library(emmeans)
cre_sam <- pairs(lsmeans(model_2016f2_1,~creek|sample))
sam_cre <- pairs(lsmeans(model_2016f2_1,~sample|creek))

library(kableExtra)
kable(test(rbind(cre_sam,sam_cre),adjust="tukey"),format="latex")%>%kable_styling("condensed",full_width=F,font_size = 8)%>%row_spec(c(2),bold =T)%>%row_spec(c(2),background ="#EAFAF1")
TukeyHSD(model_2016f2_3,conf.level=0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = oxygen ~ creek_f + sample_r %in% creek_f, data = table_2016f2)
## 
## $creek_f
##                diff        lwr        upr   p adj
## creek2-creek1 -0.41 -0.5222235 -0.2977765 3.0e-07
## creek3-creek1  0.32  0.2077765  0.4322235 6.2e-06
## creek3-creek2  0.73  0.6177765  0.8422235 0.0e+00
## 
## $`creek_f:sample_r`
##                                diff        lwr        upr     p adj
## creek2:sample1-creek1:sample1 -0.10 -0.4859204  0.2859204 0.9982090
## creek3:sample1-creek1:sample1  0.55  0.1640796  0.9359204 0.0024451
## creek1:sample2-creek1:sample1  0.35 -0.0359204  0.7359204 0.0948215
## creek2:sample2-creek1:sample1 -0.25 -0.6359204  0.1359204 0.4419456
## creek3:sample2-creek1:sample1  0.50  0.1140796  0.8859204 0.0060915
## creek1:sample3-creek1:sample1  0.10 -0.2859204  0.4859204 0.9982090
## creek2:sample3-creek1:sample1 -0.05 -0.4359204  0.3359204 0.9999994
## creek3:sample3-creek1:sample1  0.45  0.0640796  0.8359204 0.0153692
## creek1:sample4-creek1:sample1  0.25 -0.1359204  0.6359204 0.4419456
## creek2:sample4-creek1:sample1 -0.30 -0.6859204  0.0859204 0.2177180
## creek3:sample4-creek1:sample1  0.55  0.1640796  0.9359204 0.0024451
## creek1:sample5-creek1:sample1  0.35 -0.0359204  0.7359204 0.0948215
## creek2:sample5-creek1:sample1 -0.30 -0.6859204  0.0859204 0.2177180
## creek3:sample5-creek1:sample1  0.60  0.2140796  0.9859204 0.0010028
## creek3:sample1-creek2:sample1  0.65  0.2640796  1.0359204 0.0004223
## creek1:sample2-creek2:sample1  0.45  0.0640796  0.8359204 0.0153692
## creek2:sample2-creek2:sample1 -0.15 -0.5359204  0.2359204 0.9460679
## creek3:sample2-creek2:sample1  0.60  0.2140796  0.9859204 0.0010028
## creek1:sample3-creek2:sample1  0.20 -0.1859204  0.5859204 0.7351714
## creek2:sample3-creek2:sample1  0.05 -0.3359204  0.4359204 0.9999994
## creek3:sample3-creek2:sample1  0.55  0.1640796  0.9359204 0.0024451
## creek1:sample4-creek2:sample1  0.35 -0.0359204  0.7359204 0.0948215
## creek2:sample4-creek2:sample1 -0.20 -0.5859204  0.1859204 0.7351714
## creek3:sample4-creek2:sample1  0.65  0.2640796  1.0359204 0.0004223
## creek1:sample5-creek2:sample1  0.45  0.0640796  0.8359204 0.0153692
## creek2:sample5-creek2:sample1 -0.20 -0.5859204  0.1859204 0.7351714
## creek3:sample5-creek2:sample1  0.70  0.3140796  1.0859204 0.0001830
## creek1:sample2-creek3:sample1 -0.20 -0.5859204  0.1859204 0.7351714
## creek2:sample2-creek3:sample1 -0.80 -1.1859204 -0.4140796 0.0000376
## creek3:sample2-creek3:sample1 -0.05 -0.4359204  0.3359204 0.9999994
## creek1:sample3-creek3:sample1 -0.45 -0.8359204 -0.0640796 0.0153692
## creek2:sample3-creek3:sample1 -0.60 -0.9859204 -0.2140796 0.0010028
## creek3:sample3-creek3:sample1 -0.10 -0.4859204  0.2859204 0.9982090
## creek1:sample4-creek3:sample1 -0.30 -0.6859204  0.0859204 0.2177180
## creek2:sample4-creek3:sample1 -0.85 -1.2359204 -0.4640796 0.0000178
## creek3:sample4-creek3:sample1  0.00 -0.3859204  0.3859204 1.0000000
## creek1:sample5-creek3:sample1 -0.20 -0.5859204  0.1859204 0.7351714
## creek2:sample5-creek3:sample1 -0.85 -1.2359204 -0.4640796 0.0000178
## creek3:sample5-creek3:sample1  0.05 -0.3359204  0.4359204 0.9999994
## creek2:sample2-creek1:sample2 -0.60 -0.9859204 -0.2140796 0.0010028
## creek3:sample2-creek1:sample2  0.15 -0.2359204  0.5359204 0.9460679
## creek1:sample3-creek1:sample2 -0.25 -0.6359204  0.1359204 0.4419456
## creek2:sample3-creek1:sample2 -0.40 -0.7859204 -0.0140796 0.0386879
## creek3:sample3-creek1:sample2  0.10 -0.2859204  0.4859204 0.9982090
## creek1:sample4-creek1:sample2 -0.10 -0.4859204  0.2859204 0.9982090
## creek2:sample4-creek1:sample2 -0.65 -1.0359204 -0.2640796 0.0004223
## creek3:sample4-creek1:sample2  0.20 -0.1859204  0.5859204 0.7351714
## creek1:sample5-creek1:sample2  0.00 -0.3859204  0.3859204 1.0000000
## creek2:sample5-creek1:sample2 -0.65 -1.0359204 -0.2640796 0.0004223
## creek3:sample5-creek1:sample2  0.25 -0.1359204  0.6359204 0.4419456
## creek3:sample2-creek2:sample2  0.75  0.3640796  1.1359204 0.0000817
## creek1:sample3-creek2:sample2  0.35 -0.0359204  0.7359204 0.0948215
## creek2:sample3-creek2:sample2  0.20 -0.1859204  0.5859204 0.7351714
## creek3:sample3-creek2:sample2  0.70  0.3140796  1.0859204 0.0001830
## creek1:sample4-creek2:sample2  0.50  0.1140796  0.8859204 0.0060915
## creek2:sample4-creek2:sample2 -0.05 -0.4359204  0.3359204 0.9999994
## creek3:sample4-creek2:sample2  0.80  0.4140796  1.1859204 0.0000376
## creek1:sample5-creek2:sample2  0.60  0.2140796  0.9859204 0.0010028
## creek2:sample5-creek2:sample2 -0.05 -0.4359204  0.3359204 0.9999994
## creek3:sample5-creek2:sample2  0.85  0.4640796  1.2359204 0.0000178
## creek1:sample3-creek3:sample2 -0.40 -0.7859204 -0.0140796 0.0386879
## creek2:sample3-creek3:sample2 -0.55 -0.9359204 -0.1640796 0.0024451
## creek3:sample3-creek3:sample2 -0.05 -0.4359204  0.3359204 0.9999994
## creek1:sample4-creek3:sample2 -0.25 -0.6359204  0.1359204 0.4419456
## creek2:sample4-creek3:sample2 -0.80 -1.1859204 -0.4140796 0.0000376
## creek3:sample4-creek3:sample2  0.05 -0.3359204  0.4359204 0.9999994
## creek1:sample5-creek3:sample2 -0.15 -0.5359204  0.2359204 0.9460679
## creek2:sample5-creek3:sample2 -0.80 -1.1859204 -0.4140796 0.0000376
## creek3:sample5-creek3:sample2  0.10 -0.2859204  0.4859204 0.9982090
## creek2:sample3-creek1:sample3 -0.15 -0.5359204  0.2359204 0.9460679
## creek3:sample3-creek1:sample3  0.35 -0.0359204  0.7359204 0.0948215
## creek1:sample4-creek1:sample3  0.15 -0.2359204  0.5359204 0.9460679
## creek2:sample4-creek1:sample3 -0.40 -0.7859204 -0.0140796 0.0386879
## creek3:sample4-creek1:sample3  0.45  0.0640796  0.8359204 0.0153692
## creek1:sample5-creek1:sample3  0.25 -0.1359204  0.6359204 0.4419456
## creek2:sample5-creek1:sample3 -0.40 -0.7859204 -0.0140796 0.0386879
## creek3:sample5-creek1:sample3  0.50  0.1140796  0.8859204 0.0060915
## creek3:sample3-creek2:sample3  0.50  0.1140796  0.8859204 0.0060915
## creek1:sample4-creek2:sample3  0.30 -0.0859204  0.6859204 0.2177180
## creek2:sample4-creek2:sample3 -0.25 -0.6359204  0.1359204 0.4419456
## creek3:sample4-creek2:sample3  0.60  0.2140796  0.9859204 0.0010028
## creek1:sample5-creek2:sample3  0.40  0.0140796  0.7859204 0.0386879
## creek2:sample5-creek2:sample3 -0.25 -0.6359204  0.1359204 0.4419456
## creek3:sample5-creek2:sample3  0.65  0.2640796  1.0359204 0.0004223
## creek1:sample4-creek3:sample3 -0.20 -0.5859204  0.1859204 0.7351714
## creek2:sample4-creek3:sample3 -0.75 -1.1359204 -0.3640796 0.0000817
## creek3:sample4-creek3:sample3  0.10 -0.2859204  0.4859204 0.9982090
## creek1:sample5-creek3:sample3 -0.10 -0.4859204  0.2859204 0.9982090
## creek2:sample5-creek3:sample3 -0.75 -1.1359204 -0.3640796 0.0000817
## creek3:sample5-creek3:sample3  0.15 -0.2359204  0.5359204 0.9460679
## creek2:sample4-creek1:sample4 -0.55 -0.9359204 -0.1640796 0.0024451
## creek3:sample4-creek1:sample4  0.30 -0.0859204  0.6859204 0.2177180
## creek1:sample5-creek1:sample4  0.10 -0.2859204  0.4859204 0.9982090
## creek2:sample5-creek1:sample4 -0.55 -0.9359204 -0.1640796 0.0024451
## creek3:sample5-creek1:sample4  0.35 -0.0359204  0.7359204 0.0948215
## creek3:sample4-creek2:sample4  0.85  0.4640796  1.2359204 0.0000178
## creek1:sample5-creek2:sample4  0.65  0.2640796  1.0359204 0.0004223
## creek2:sample5-creek2:sample4  0.00 -0.3859204  0.3859204 1.0000000
## creek3:sample5-creek2:sample4  0.90  0.5140796  1.2859204 0.0000087
## creek1:sample5-creek3:sample4 -0.20 -0.5859204  0.1859204 0.7351714
## creek2:sample5-creek3:sample4 -0.85 -1.2359204 -0.4640796 0.0000178
## creek3:sample5-creek3:sample4  0.05 -0.3359204  0.4359204 0.9999994
## creek2:sample5-creek1:sample5 -0.65 -1.0359204 -0.2640796 0.0004223
## creek3:sample5-creek1:sample5  0.25 -0.1359204  0.6359204 0.4419456
## creek3:sample5-creek2:sample5  0.90  0.5140796  1.2859204 0.0000087
cre_sam <- pairs(lsmeans(model_2016f2_3,~creek_f|sample_r))
sam_cre <- pairs(lsmeans(model_2016f2_3,~sample_r|creek_f))

kable(test(rbind(cre_sam,sam_cre),adjust="tukey"),format="latex")%>%kable_styling("condensed",full_width=F,font_size = 8)%>%row_spec(c(2),bold =T)%>%row_spec(c(2),background ="#EAFAF1")

2017S

Brad Crain, Jong Sung Kim*

2017SR1

2015F1

Find the best model for predicting Y based on X1 and X2. Y is the amount of profit that a company makes in a month. X1 is the number of months that the company has been in business. X2 is the amount spent on advertising. Consider as predictors all possible linear and quadratic terms (\(X1, X1^2, X2, X2^2\), and \(X1X2\)). Consider possible transformations of Y. Include all appropriate diagnostics. When you have found your “best” model, predict a new Y when X1 = 20 and X2 = /$1,500, giving a 95% prediction interval. The data set, shown below, appears in “Profits.xlsx”.

table_2017sr1 <- readxl::read_xlsx("qe_lab/Profits_2017s.xlsx")
# table_2017sr1$X1 <- as.factor(table_2017sr1$X1)
str(table_2017sr1)
## Classes 'tbl_df', 'tbl' and 'data.frame':    25 obs. of  3 variables:
##  $ X1: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ X2: num  1928 1366 1402 1325 1561 ...
##  $ Y : num  12577 12720 13244 13741 14157 ...
summary(table_2017sr1)
##        X1           X2             Y        
##  Min.   : 1   Min.   :1091   Min.   :12577  
##  1st Qu.: 7   1st Qu.:1522   1st Qu.:14990  
##  Median :13   Median :1861   Median :16258  
##  Mean   :13   Mean   :1914   Mean   :16235  
##  3rd Qu.:19   3rd Qu.:2196   3rd Qu.:17433  
##  Max.   :25   Max.   :2975   Max.   :20396
library(ggplot2)
ggplot(table_2017sr1, aes(X2,Y,color=X1))+geom_point()+theme_light()
ggplot(table_2017sr1, aes(X1,Y,color=X2))+geom_point()+theme_light()

model_2017sr1 <- lm(Y^2~X1+X2, table_2017sr1)
# car::vif(model_2017sr1)
summary(model_2017sr1)
## 
## Call:
## lm(formula = Y^2 ~ X1 + X2, data = table_2017sr1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -30805386  -9969025   3791394  10176772  20218197 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 231392970   11535085  20.060 1.25e-15 ***
## X1            9924495     455383  21.794  < 2e-16 ***
## X2             -48414       6402  -7.563 1.48e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14900000 on 22 degrees of freedom
## Multiple R-squared:  0.956,  Adjusted R-squared:  0.952 
## F-statistic:   239 on 2 and 22 DF,  p-value: 1.194e-15
anova(model_2017sr1)
## Analysis of Variance Table
## 
## Response: Y^2
##           Df     Sum Sq    Mean Sq F value    Pr(>F)    
## X1         1 9.3403e+16 9.3403e+16 420.892 7.822e-16 ***
## X2         1 1.2692e+16 1.2692e+16  57.192 1.482e-07 ***
## Residuals 22 4.8822e+15 2.2192e+14                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model_2017sr1,c(1,3,5))
residual_2017sr1 <- rstudent(model_2017sr1)
qqnorm(residual_2017sr1)
qqline(residual_2017sr1)
olsrr::ols_plot_resid_hist(model_2017sr1)
hist(residual_2017sr1)

sqrt(predict(model_2017sr1,newdata = data.frame(X1=20,X2=1500),interval = "prediction", level = 0.95))
##        fit      lwr      upr
## 1 18901.39 18003.89 19758.17

2017SD1

Review the data provided in ’NBalance.xlsx’. Note, there were nine distinct treatments [Feed Rations] and three distinct animals. An experimental design was used to examine the means differences in the Nitrogen balance in ruminants. Provide the following in your answer

  1. Which design was used, include the required parameters of the experimental design \([t; b; k; r;\lambda]\)

\(y=t+b+k+r+\lambda+\varepsilon\) Latin Square? BIBD?

  1. An appropriate ANOVA

  2. A TukeyHSD analysis of the proper means differences

  3. Conclusions on the impact of Feed Rations on Nitrogen Balance in Ruminants

Source: J.L. Gill (1978), Design and analysis of experiments in the animal and medical sciences, Vol2. Ames, Iowa: Iowa State University Press

## Classes 'tbl_df', 'tbl' and 'data.frame':    27 obs. of  4 variables:
##  $ Block   : Factor w/ 9 levels "Blk1","Blk2",..: 1 1 1 2 2 2 3 3 3 4 ...
##  $ Animal  : Factor w/ 3 levels "Animal1","Animal2",..: 1 2 3 1 2 3 1 2 3 1 ...
##  $ Ration  : Factor w/ 9 levels "Ration1","Ration2",..: 1 2 3 1 4 6 1 5 7 2 ...
##  $ Nitrogen: num  33.7 37.8 42.2 38.6 45.4 ...
library(ggpubr)
ggline(table_2017sd1, "Animal","Nitrogen", add = c("mean","jitter"),color = "Ration",shape = "Ration")
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced
ggline(table_2017sd1, "Block","Nitrogen", add = c("mean","jitter"),color = "Ration",shape = "Ration")
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced
ggline(table_2017sd1, "Animal","Nitrogen", add = c("mean","jitter"),color = "Block",shape = "Block")
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced
ggline(table_2017sd1, "Ration","Nitrogen", add = c("mean","jitter"),color = "Block",shape = "Block")
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced
ggline(table_2017sd1, "Ration","Nitrogen", add = c("mean","jitter"),color = "Animal",shape = "Animal")
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced
ggline(table_2017sd1, "Block","Nitrogen", add = c("mean","jitter"),color = "Animal",shape = "Animal")
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

model_2017sd1 <- aov(Nitrogen~Animal+Ration, table_2017sd1)

summary(model_2017sd1)
##             Df Sum Sq Mean Sq F value Pr(>F)
## Animal       2  42.23   21.12   1.237  0.317
## Ration       8 274.92   34.36   2.013  0.111
## Residuals   16 273.10   17.07
anova(model_2017sd1)
## Analysis of Variance Table
## 
## Response: Nitrogen
##           Df  Sum Sq Mean Sq F value Pr(>F)
## Animal     2  42.233  21.117  1.2372 0.3165
## Ration     8 274.919  34.365  2.0133 0.1112
## Residuals 16 273.096  17.069
TukeyHSD(model_2017sd1,conf.level = 0.95)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Nitrogen ~ Animal + Ration, data = table_2017sd1)
## 
## $Animal
##                      diff       lwr      upr     p adj
## Animal2-Animal1 -1.137778 -6.163134 3.887579 0.8304041
## Animal3-Animal1  1.894444 -3.130912 6.919801 0.6039032
## Animal3-Animal2  3.032222 -1.993134 8.057579 0.2921780
## 
## $Ration
##                       diff        lwr       upr     p adj
## Ration2-Ration1  6.7192593  -5.281040 18.719558 0.5684881
## Ration3-Ration1  5.7285185  -6.271780 17.728817 0.7397514
## Ration4-Ration1  8.2318519  -3.768447 20.232151 0.3269033
## Ration5-Ration1  1.0018519 -10.998447 13.002151 0.9999968
## Ration6-Ration1  6.5937037  -5.406595 18.594003 0.5905654
## Ration7-Ration1 -1.0362963 -13.036595 10.964003 0.9999958
## Ration8-Ration1  2.0388889  -9.961410 14.039188 0.9993103
## Ration9-Ration1  3.7022222  -8.298077 15.702521 0.9663240
## Ration3-Ration2 -0.9907407 -12.991040 11.009558 0.9999970
## Ration4-Ration2  1.5125926 -10.487706 13.512892 0.9999235
## Ration5-Ration2 -5.7174074 -17.717706  6.282892 0.7415690
## Ration6-Ration2 -0.1255556 -12.125854 11.874743 1.0000000
## Ration7-Ration2 -7.7555556 -19.755854  4.244743 0.3959375
## Ration8-Ration2 -4.6803704 -16.680669  7.319929 0.8870802
## Ration9-Ration2 -3.0170370 -15.017336  8.983262 0.9901325
## Ration4-Ration3  2.5033333  -9.496966 14.503632 0.9971040
## Ration5-Ration3 -4.7266667 -16.726966  7.273632 0.8818327
## Ration6-Ration3  0.8651852 -11.135114 12.865484 0.9999990
## Ration7-Ration3 -6.7648148 -18.765114  5.235484 0.5605018
## Ration8-Ration3 -3.6896296 -15.689929  8.310669 0.9669660
## Ration9-Ration3 -2.0262963 -14.026595  9.974003 0.9993402
## Ration5-Ration4 -7.2300000 -19.230299  4.770299 0.4805005
## Ration6-Ration4 -1.6381481 -13.638447 10.362151 0.9998610
## Ration7-Ration4 -9.2681481 -21.268447  2.732151 0.2059952
## Ration8-Ration4 -6.1929630 -18.193262  5.807336 0.6609465
## Ration9-Ration4 -4.5296296 -16.529929  7.470669 0.9032344
## Ration6-Ration5  5.5918519  -6.408447 17.592151 0.7618186
## Ration7-Ration5 -2.0381481 -14.038447  9.962151 0.9993120
## Ration8-Ration5  1.0370370 -10.963262 13.037336 0.9999958
## Ration9-Ration5  2.7003704  -9.299929 14.700669 0.9951866
## Ration7-Ration6 -7.6300000 -19.630299  4.370299 0.4154158
## Ration8-Ration6 -4.5548148 -16.555114  7.445484 0.9006356
## Ration9-Ration6 -2.8914815 -14.891780  9.108817 0.9924771
## Ration8-Ration7  3.0751852  -8.925114 15.075484 0.9888726
## Ration9-Ration7  4.7385185  -7.261780 16.738817 0.8804680
## Ration9-Ration8  1.6633333 -10.336966 13.663632 0.9998443

2017F

Robert Fountain*, Daniel Taylor-Rodriguez

2017F1

2016S1

Find the best model for predicting Y (weight) based on X1 (age), X2 (height), and X3 (indicator for male). Consider as predictors all possible linear and quadratic terms. Consider possible transformations of Y. Include all appropriate diagnostics. When you have found your “best” model, predict a new Y when X1 = 26, X2 = 70, and X3 = 1, giving a 95% prediction interval. The data set, shown below, appears in “RegressionFall17.xlsx”.

table_2017f1 <- readxl::read_xlsx("qe_lab/RegressionFall17.xlsx")[-1,]
table_2017f1$weight <- round(as.numeric(table_2017f1$weight),2)
table_2017f1$age <- as.numeric(table_2017f1$age)
table_2017f1$height <- round(as.numeric(table_2017f1$height),2)
table_2017f1$male <- factor(table_2017f1$male, labels = c("female", "male"))
str(table_2017f1)
## Classes 'tbl_df', 'tbl' and 'data.frame':    30 obs. of  4 variables:
##  $ weight: num  240 100 233 108 239 ...
##  $ age   : num  20 20 20 20 20 21 21 21 21 21 ...
##  $ height: num  71 67.2 68.1 67.7 68.6 65.2 67.6 67.4 67.5 69.4 ...
##  $ male  : Factor w/ 2 levels "female","male": 2 1 2 1 2 1 1 1 1 2 ...
library(ggplot2)
ggplot(table_2017f1, aes(height,weight,color=age,shape=male))+geom_point()+theme_light()

library(ggpubr)
ggline(table_2017f1,"height","weight",add=c("mean","jetter"),color="age")
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced
ggline(table_2017f1,"height","weight",add=c("mean","jetter"),color="male",shape = "male")
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

model_2017f1 <- lm(weight~height*age*male,table_2017f1)
library(olsrr)
ols_step_both_aic(model_2017f1)
## Stepwise Selection Method 
## -------------------------
## 
## Candidate Terms: 
## 
## 1 . height 
## 2 . age 
## 3 . male 
## 4 . height:age 
## 5 . height:male 
## 6 . age:male 
## 7 . height:age:male 
## 
## 
## Variables Entered/Removed: 
## 
## - height:age:male added 
## - age:male added 
## - height:age added 
## 
## No more variables to be added or removed.
## 
## 
##                                      Stepwise Summary                                      
## -----------------------------------------------------------------------------------------
## Variable            Method       AIC         RSS         Sum Sq       R-Sq      Adj. R-Sq 
## -----------------------------------------------------------------------------------------
## height:age:male    addition    304.169    34051.024    163429.310    0.82757      0.81480 
## age:male           addition    303.786    29423.052    168057.281    0.85101      0.82717 
## height:age         addition    303.786    29423.052    168057.281    0.85101      0.82717 
## -----------------------------------------------------------------------------------------
ols_step_both_p(model_2017f1)
## Stepwise Selection Method   
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. height 
## 2. age 
## 3. male 
## 4. height:age 
## 5. height:male 
## 6. age:male 
## 7. height:age:male 
## 
## We are selecting variables based on p value...
## 
## Variables Entered/Removed: 
## 
## - male added 
## - age:male added 
## - age added 
## - height added 
## 
## No more variables to be added/removed.
## 
## 
## Final Model Output 
## ------------------
## 
##                          Model Summary                           
## ----------------------------------------------------------------
## R                       0.921       RMSE                 34.629 
## R-Squared               0.848       Coef. Var            20.228 
## Adj. R-Squared          0.824       MSE                1199.151 
## Pred R-Squared          0.788       MAE                  20.901 
## ----------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                  
## ----------------------------------------------------------------------
##                   Sum of                                              
##                  Squares        DF    Mean Square      F         Sig. 
## ----------------------------------------------------------------------
## Regression    167501.551         4      41875.388    34.921    0.0000 
## Residual       29978.783        25       1199.151                     
## Total         197480.333        29                                    
## ----------------------------------------------------------------------
## 
##                                       Parameter Estimates                                       
## -----------------------------------------------------------------------------------------------
##        model        Beta    Std. Error    Std. Beta      t        Sig         lower      upper 
## -----------------------------------------------------------------------------------------------
##  (Intercept)    -321.525       435.768                 -0.738    0.467    -1219.006    575.956 
##     malemale    -191.733       172.899       -1.158    -1.109    0.278     -547.826    164.360 
##          age      -1.245         5.847       -0.026    -0.213    0.833      -13.287     10.797 
##       height       6.951         5.487        0.172     1.267    0.217       -4.349     18.252 
## malemale:age      14.058         7.892        1.929     1.781    0.087       -2.195     30.311 
## -----------------------------------------------------------------------------------------------
## 
##                              Stepwise Selection Summary                               
## -------------------------------------------------------------------------------------
##                      Added/                   Adj.                                       
## Step    Variable    Removed     R-Square    R-Square     C(p)       AIC        RMSE      
## -------------------------------------------------------------------------------------
##    1      male      addition       0.795       0.788    6.9980    307.3529    38.0198    
##    2    age:male    addition       0.838       0.820    2.0100    304.2144    35.0293    
##    3      age       addition       0.838       0.820    4.0100    304.2144    35.0293    
##    4     height     addition       0.848       0.824    4.4410    304.3477    34.6288    
## -------------------------------------------------------------------------------------
model_2017f1_1 <- lm(weight~height+age:male,table_2017f1)
model_2017f1_2 <- lm(log(weight)~height+age:male,table_2017f1)
car::vif(model_2017f1_2)
##            GVIF Df GVIF^(1/(2*Df))
## height   2.8472  1        1.687365
## age:male 2.8472  2        1.298986
summary(model_2017f1_2)
## 
## Call:
## lm(formula = log(weight) ~ height + age:male, data = table_2017f1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35019 -0.06823 -0.03331  0.08138  0.70476 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    0.592786   2.103225   0.282   0.7803  
## height         0.058601   0.028409   2.063   0.0493 *
## age:malefemale 0.009281   0.021315   0.435   0.6668  
## age:malemale   0.038784   0.020107   1.929   0.0647 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1852 on 26 degrees of freedom
## Multiple R-squared:  0.8599, Adjusted R-squared:  0.8438 
## F-statistic:  53.2 on 3 and 26 DF,  p-value: 3.121e-11
anova(model_2017f1_2)
## Analysis of Variance Table
## 
## Response: log(weight)
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## height     1 4.1007  4.1007 119.540 3.203e-11 ***
## age:male   2 1.3744  0.6872  20.033 5.432e-06 ***
## Residuals 26 0.8919  0.0343                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ols_regress(model_2017f1_2)
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.927       RMSE               0.185 
## R-Squared               0.860       Coef. Var          3.679 
## Adj. R-Squared          0.844       MSE                0.034 
## Pred R-Squared          0.819       MAE                0.112 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                
## -------------------------------------------------------------------
##                Sum of                                              
##               Squares        DF    Mean Square      F         Sig. 
## -------------------------------------------------------------------
## Regression      5.475         3          1.825    53.202    0.0000 
## Residual        0.892        26          0.034                     
## Total           6.367        29                                    
## -------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##          model     Beta    Std. Error    Std. Beta      t       Sig      lower    upper 
## ----------------------------------------------------------------------------------------
##    (Intercept)    0.593         2.103                 0.282    0.780    -3.730    4.916 
##         height    0.059         0.028        0.255    2.063    0.049     0.000    0.117 
## age:malefemale    0.009         0.021        0.223    0.435    0.667    -0.035    0.053 
##   age:malemale    0.039         0.020        0.937    1.929    0.065    -0.003    0.080 
## ----------------------------------------------------------------------------------------
plot(model_2017f1_2)

predict(model_2017f1_2, newdata=data.frame(age= 26, height= 70, male= "male"),interval = "prediction",level = 0.95)
##        fit      lwr      upr
## 1 5.703233 5.278615 6.127851

2017F2

A process engineer is testing the yield of a product manufactured on three specific machines. Each machine can be operated at fixed high and low power settings, although the actual settings differ from one machine to the next. Furthermore, a machine has three stations on which the product is formed, and these are the same for each machine. An experiment is conducted in which each machine is tested at both power settings, and three observations on yield are taken from each station. The runs are made in random order. Analyze this experiment. The data set, shown below, appears in “DesignFall17.xlsx”.

DesignFall17 <- readxl::read_excel("qe_lab/DesignFall17.xlsx")
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * … and 4 more problems
library(tidyverse)
table_2017f2 <- gather(DesignFall17[c(2:4,6:8),c(2:4,6:8,10:12)])
names(table_2017f2)<- c("machine","y")
table_2017f2<- table_2017f2[c("y","machine")]
table_2017f2$machine <- as.factor(c(rep("machine1",18),rep("machine2",18),rep("machine3",18)))
table_2017f2$station <- as.factor(rep(c(rep("station1",6),rep("station2",6),rep("station3",6)),3))
table_2017f2$power <- as.factor(rep(c(rep("power1",3),rep("power2",3)),9))
str(table_2017f2)
## Classes 'tbl_df', 'tbl' and 'data.frame':    54 obs. of  4 variables:
##  $ y      : num  35.1 31.3 32.6 24.3 26.3 27.1 34.7 35.9 36 28.1 ...
##  $ machine: Factor w/ 3 levels "machine1","machine2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ station: Factor w/ 3 levels "station1","station2",..: 1 1 1 1 1 1 2 2 2 2 ...
##  $ power  : Factor w/ 2 levels "power1","power2": 1 1 1 2 2 2 1 1 1 2 ...

model_2017f2 <- lm(y~power*station*machine, table_2017f2)
library(olsrr)
# ols_step_both_aic(model_2017f2)
# ols_step_both_p(model_2017f2)
model_2017f2_2 <- lm(y~machine+power:machine+power:station:machine, table_2017f2)
summary(model_2017f2_2)
## 
## Call:
## lm(formula = y ~ machine + power:machine + power:station:machine, 
##     data = table_2017f2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0000 -0.6500  0.1000  0.7583  2.2000 
## 
## Coefficients:
##                                               Estimate Std. Error t value
## (Intercept)                                  3.300e+01  7.562e-01  43.639
## machinemachine2                              1.367e+00  1.069e+00   1.278
## machinemachine3                             -2.546e-14  1.069e+00   0.000
## machinemachine1:powerpower2                 -7.100e+00  1.069e+00  -6.639
## machinemachine2:powerpower2                 -9.233e+00  1.069e+00  -8.634
## machinemachine3:powerpower2                 -7.800e+00  1.069e+00  -7.294
## machinemachine1:powerpower1:stationstation2  2.533e+00  1.069e+00   2.369
## machinemachine2:powerpower1:stationstation2  1.033e+00  1.069e+00   0.966
## machinemachine3:powerpower1:stationstation2  3.333e-01  1.069e+00   0.312
## machinemachine1:powerpower2:stationstation2  2.767e+00  1.069e+00   2.587
## machinemachine2:powerpower2:stationstation2  5.667e-01  1.069e+00   0.530
## machinemachine3:powerpower2:stationstation2  1.000e+00  1.069e+00   0.935
## machinemachine1:powerpower1:stationstation3  4.700e+00  1.069e+00   4.395
## machinemachine2:powerpower1:stationstation3  1.200e+00  1.069e+00   1.122
## machinemachine3:powerpower1:stationstation3 -3.000e-01  1.069e+00  -0.281
## machinemachine1:powerpower2:stationstation3 -3.333e-01  1.069e+00  -0.312
## machinemachine2:powerpower2:stationstation3  5.333e-01  1.069e+00   0.499
## machinemachine3:powerpower2:stationstation3 -1.367e+00  1.069e+00  -1.278
##                                             Pr(>|t|)    
## (Intercept)                                  < 2e-16 ***
## machinemachine2                               0.2095    
## machinemachine3                               1.0000    
## machinemachine1:powerpower2                 9.82e-08 ***
## machinemachine2:powerpower2                 2.70e-10 ***
## machinemachine3:powerpower2                 1.36e-08 ***
## machinemachine1:powerpower1:stationstation2   0.0233 *  
## machinemachine2:powerpower1:stationstation2   0.3404    
## machinemachine3:powerpower1:stationstation2   0.7571    
## machinemachine1:powerpower2:stationstation2   0.0139 *  
## machinemachine2:powerpower2:stationstation2   0.5995    
## machinemachine3:powerpower2:stationstation2   0.3560    
## machinemachine1:powerpower1:stationstation3 9.38e-05 ***
## machinemachine2:powerpower1:stationstation3   0.2693    
## machinemachine3:powerpower1:stationstation3   0.7807    
## machinemachine1:powerpower2:stationstation3   0.7571    
## machinemachine2:powerpower2:stationstation3   0.6210    
## machinemachine3:powerpower2:stationstation3   0.2095    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.31 on 36 degrees of freedom
## Multiple R-squared:  0.9486, Adjusted R-squared:  0.9243 
## F-statistic: 39.08 on 17 and 36 DF,  p-value: < 2.2e-16
anova(model_2017f2_2)
## Analysis of Variance Table
## 
## Response: y
##                       Df  Sum Sq Mean Sq  F value    Pr(>F)    
## machine                2   37.37   18.68  10.8913  0.000200 ***
## machine:power          3 1039.51  346.50 201.9765 < 2.2e-16 ***
## machine:power:station 12   62.79    5.23   3.0501  0.004742 ** 
## Residuals             36   61.76    1.72                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# When some factors are random 
library(GAD)
table_2019s2$Run_r <- as.random(table_2019s2$Run)
table_2019s2$Trt_f <- as.fixed(table_2019s2$Trt)
table_2019s2$Rev_f <- as.fixed(table_2019s2$Rev)
model_2019s2_1 <- aov(formula = Shrink ~ Run_r+Trt_f + Trt_f%in%Run_r+ Rev_f%in%Run_r + Rev_f + Trt_f:Rev_f, data=table_2019s2)
pander::pander(gad(model_2019s2_1))
# When some factors are random 
library("lme4")
model_2017f2_3 <- lmer(formula = y ~ (1|machine) + station + power+ (1|machine:station) + (1|machine:station:power), data=table_2017f2 , REML = TRUE)
summary(model_2017f2_3)$varcor
pander::pander(confint(model_2019s2_2)[1:4,1:2])

2018S

Robert Fountain*, Daniel Taylor-Rodriguez

2018S1

The data for this problem was obtained from research relating children smoking to pulmonary function. Today it is well established that smoking cigarettes is a very unhealthy habit, especially for children; however, this was not well-known in the past. This data corresponds to one of the first studies of the effects of smoking on pulmonary (i.e., lung) function, an observational study of 654 youths aged 3 to 19. The variables in the study are displayed in Table 1 below. The outcome variable is volume, which measures the liters of air exhaled by the child in the first second of a forced breath. Some evidence in the literature suggests that children under age 6 may not understand the instructions of the breath exhalation test, so that the quality of volume measurements for those children is suspect. We are interested in the relationship between smoking, gender and the volume of air exhaled. Smoking is expected to impair pulmonary function (i.e., decrease volume).

Find the best model to predict volume considering as predictors all possible linear, quadratic and pairwise interaction terms. Additionally, consider possible transformations of the response (i.e., volume), and include all relevant diagnostic measures. Once you select the best model, write down and test the hypothesis to determine if the volume is influenced by the smoking status in terms of your best model’s parameters. Using this same model, predict the volume for a 16-yearold male smoker who is 61 inches high, and provide a 95% prediction interval. A description of the variables is found in the table below, and the data is included in the file Problem1_ChildSmoking.xlsx.

Variable Name and Description

age: age of child in years

volume: volume of air in exhaled breath in liters

height: height of child in inches

male=1 if child is male, and =0 otherwise

smoker=1 if child reports that he or she smokes cigarettes regularly, and =0 otherwise

table_2018s1 <- readxl::read_xlsx("qe_lab/Problem1_ChildSmoking.xlsx")
table_2018s1_above6 <- table_2018s1[which(table_2018s1$age>5),]
table_2018s1_above6$age <- factor(table_2018s1_above6$age)
table_2018s1_above6$male <- factor(table_2018s1_above6$male, labels = c("female","male"))
table_2018s1_above6$smoker <- factor(table_2018s1_above6$smoker, labels = c("not regu","regularly"))
str(table_2018s1)
## Classes 'tbl_df', 'tbl' and 'data.frame':    654 obs. of  5 variables:
##  $ age   : num  9 8 7 9 9 8 6 6 8 9 ...
##  $ volume: num  1.71 1.72 1.72 1.56 1.9 ...
##  $ height: num  57 67.5 54.5 53 57 61 58 56 58.5 60 ...
##  $ male  : num  0 0 0 1 1 0 0 0 0 0 ...
##  $ smoker: num  0 0 0 0 0 0 0 0 0 0 ...
str(table_2018s1_above6)
## Classes 'tbl_df', 'tbl' and 'data.frame':    615 obs. of  5 variables:
##  $ age   : Factor w/ 14 levels "6","7","8","9",..: 4 3 2 4 4 3 1 1 3 4 ...
##  $ volume: num  1.71 1.72 1.72 1.56 1.9 ...
##  $ height: num  57 67.5 54.5 53 57 61 58 56 58.5 60 ...
##  $ male  : Factor w/ 2 levels "female","male": 1 1 1 2 2 1 1 1 1 1 ...
##  $ smoker: Factor w/ 2 levels "not regu","regularly": 1 1 1 1 1 1 1 1 1 1 ...
summary(table_2018s1$height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   46.00   57.00   61.50   61.14   65.50   74.00
## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced

## Warning in stats::qt(ci/2 + 0.5, data_sum$length - 1): NaNs produced
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

model_2018s1 <- lm(volume~height*age*male*smoker,table_2018s1_above6)
ols_step_both_aic(model_2018s1)
ols_step_both_p(model_2018s1)
model_2018s1_2 <- lm(log(volume)~log(height):age:male+smoker,table_2018s1_above6)
summary(model_2018s1_2)
## 
## Call:
## lm(formula = log(volume) ~ log(height):age:male + smoker, data = table_2018s1_above6)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.53620 -0.08805  0.01031  0.08931  0.32758 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -9.55944    0.50972 -18.754   <2e-16 ***
## smokerregularly              -0.04161    0.02162  -1.925   0.0547 .  
## log(height):age6:malefemale   2.52311    0.12828  19.669   <2e-16 ***
## log(height):age7:malefemale   2.53067    0.12722  19.892   <2e-16 ***
## log(height):age8:malefemale   2.52611    0.12503  20.205   <2e-16 ***
## log(height):age9:malefemale   2.53901    0.12444  20.403   <2e-16 ***
## log(height):age10:malefemale  2.55392    0.12371  20.644   <2e-16 ***
## log(height):age11:malefemale  2.55951    0.12321  20.773   <2e-16 ***
## log(height):age12:malefemale  2.56509    0.12310  20.838   <2e-16 ***
## log(height):age13:malefemale  2.56979    0.12291  20.907   <2e-16 ***
## log(height):age14:malefemale  2.54910    0.12263  20.788   <2e-16 ***
## log(height):age15:malefemale  2.54670    0.12327  20.659   <2e-16 ***
## log(height):age16:malefemale  2.56462    0.12322  20.813   <2e-16 ***
## log(height):age17:malefemale  2.61978    0.12811  20.450   <2e-16 ***
## log(height):age18:malefemale  2.56344    0.12435  20.615   <2e-16 ***
## log(height):age19:malefemale  2.58822    0.12449  20.791   <2e-16 ***
## log(height):age6:malemale     2.53430    0.12861  19.705   <2e-16 ***
## log(height):age7:malemale     2.54115    0.12729  19.964   <2e-16 ***
## log(height):age8:malemale     2.53915    0.12608  20.139   <2e-16 ***
## log(height):age9:malemale     2.54426    0.12419  20.487   <2e-16 ***
## log(height):age10:malemale    2.54385    0.12324  20.642   <2e-16 ***
## log(height):age11:malemale    2.55859    0.12183  21.002   <2e-16 ***
## log(height):age12:malemale    2.56512    0.12138  21.133   <2e-16 ***
## log(height):age13:malemale    2.58523    0.12074  21.412   <2e-16 ***
## log(height):age14:malemale    2.58262    0.12086  21.368   <2e-16 ***
## log(height):age15:malemale    2.60392    0.12106  21.509   <2e-16 ***
## log(height):age16:malemale    2.59467    0.12097  21.449   <2e-16 ***
## log(height):age17:malemale    2.59767    0.12076  21.511   <2e-16 ***
## log(height):age18:malemale    2.60975    0.12237  21.327   <2e-16 ***
## log(height):age19:malemale    2.61631    0.12363  21.163   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1404 on 585 degrees of freedom
## Multiple R-squared:  0.7988, Adjusted R-squared:  0.7888 
## F-statistic: 80.08 on 29 and 585 DF,  p-value: < 2.2e-16
anova(model_2018s1_2)
## Analysis of Variance Table
## 
## Response: log(volume)
##                       Df Sum Sq Mean Sq F value    Pr(>F)    
## smoker                 1  3.197  3.1974 162.111 < 2.2e-16 ***
## log(height):age:male  28 42.605  1.5216  77.148 < 2.2e-16 ***
## Residuals            585 11.538  0.0197                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(olsrr)
ols_regress(model_2018s1_2)
##                         Model Summary                          
## --------------------------------------------------------------
## R                       0.894       RMSE                0.140 
## R-Squared               0.799       Coef. Var          14.772 
## Adj. R-Squared          0.789       MSE                 0.020 
## Pred R-Squared           -Inf       MAE                 0.108 
## --------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                 
## --------------------------------------------------------------------
##                Sum of                                               
##               Squares         DF    Mean Square      F         Sig. 
## --------------------------------------------------------------------
## Regression     45.803         29          1.579    80.078    0.0000 
## Residual       11.538        585          0.020                     
## Total          57.341        614                                    
## --------------------------------------------------------------------
## 
##                                             Parameter Estimates                                             
## -----------------------------------------------------------------------------------------------------------
##                        model      Beta    Std. Error    Std. Beta       t        Sig       lower     upper 
## -----------------------------------------------------------------------------------------------------------
##                  (Intercept)    -9.559         0.510                 -18.754    0.000    -10.561    -8.558 
##              smokerregularly    -0.042         0.022       -0.042     -1.925    0.055     -0.084     0.001 
##  log(height):age6:malefemale     2.523         0.128        5.078     19.669    0.000      2.271     2.775 
##  log(height):age7:malefemale     2.531         0.127        7.048     19.892    0.000      2.281     2.781 
##  log(height):age8:malefemale     2.526         0.125        8.879     20.205    0.000      2.281     2.772 
##  log(height):age9:malefemale     2.539         0.124        8.785     20.403    0.000      2.295     2.783 
## log(height):age10:malefemale     2.554         0.124        7.886     20.644    0.000      2.311     2.797 
## log(height):age11:malefemale     2.560         0.123        9.041     20.773    0.000      2.318     2.802 
## log(height):age12:malefemale     2.565         0.123        7.386     20.838    0.000      2.323     2.807 
## log(height):age13:malefemale     2.570         0.123        6.778     20.907    0.000      2.328     2.811 
## log(height):age14:malefemale     2.549         0.123        4.189     20.788    0.000      2.308     2.790 
## log(height):age15:malefemale     2.547         0.123        4.388     20.659    0.000      2.305     2.789 
## log(height):age16:malefemale     2.565         0.123        3.442     20.813    0.000      2.323     2.807 
## log(height):age17:malefemale     2.620         0.128        1.427     20.450    0.000      2.368     2.871 
## log(height):age18:malefemale     2.563         0.124        2.428     20.615    0.000      2.319     2.808 
## log(height):age19:malefemale     2.588         0.124        2.020     20.791    0.000      2.344     2.833 
##    log(height):age6:malemale     2.534         0.129        6.119     19.705    0.000      2.282     2.787 
##    log(height):age7:malemale     2.541         0.127        6.591     19.964    0.000      2.291     2.791 
##    log(height):age8:malemale     2.539         0.126        8.200     20.139    0.000      2.292     2.787 
##    log(height):age9:malemale     2.544         0.124        9.353     20.487    0.000      2.300     2.788 
##   log(height):age10:malemale     2.544         0.123        9.162     20.642    0.000      2.302     2.786 
##   log(height):age11:malemale     2.559         0.122        9.139     21.002    0.000      2.319     2.798 
##   log(height):age12:malemale     2.565         0.121        7.366     21.133    0.000      2.327     2.804 
##   log(height):age13:malemale     2.585         0.121        6.200     21.412    0.000      2.348     2.822 
##   log(height):age14:malemale     2.583         0.121        5.695     21.368    0.000      2.345     2.820 
##   log(height):age15:malemale     2.604         0.121        4.334     21.509    0.000      2.366     2.842 
##   log(height):age16:malemale     2.595         0.121        3.825     21.449    0.000      2.357     2.832 
##   log(height):age17:malemale     2.598         0.121        3.833     21.511    0.000      2.361     2.835 
##   log(height):age18:malemale     2.610         0.122        2.517     21.327    0.000      2.369     2.850 
##   log(height):age19:malemale     2.616         0.124        1.476     21.163    0.000      2.373     2.859 
## -----------------------------------------------------------------------------------------------------------
plot(model_2018s1_2)
## Warning: not plotting observations with leverage one:
##   570, 591

## Warning: not plotting observations with leverage one:
##   570, 591

\(y=\mu+\beta_1\ln(H)*Age*Male+\beta_2Smoker+\varepsilon\)

\(H_0: \beta_2=0\), \(H_1: \beta_2\neq0\)

predict(model_2018s1_2, newdata =data.frame(age="16",male="male",smoker="regularly",height=61), interval = "prediction", level=0.95)
##        fit       lwr      upr
## 1 1.065319 0.7691739 1.361463

2018S2

[RCBD]

An experiment is conducted to assess the effect of shipping and storage on the acceptability of avocados. Three shipping methods (labeled 1, 2 and 3) and two storage methods (labeled 1 and 2) were considered. Each combination of shipping x storage was applied to a group of four crates. Additionally, three different shipments were made. The experiment’s configuration is shown below. Analyze this experiment. The data set can be found in the file Problem2_Avocado.xlsx.

## Classes 'tbl_df', 'tbl' and 'data.frame':    72 obs. of  4 variables:
##  $ Block   : Factor w/ 3 levels "Blk1","Blk2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Shipping: Factor w/ 3 levels "Ship1","Ship2",..: 1 1 1 1 1 1 1 1 2 2 ...
##  $ Storage : Factor w/ 2 levels "Stor1","Stor2": 1 1 1 1 2 2 2 2 1 1 ...
##  $ Y       : num  73.3 66.6 61.6 64 53 ...

## 
## Call:
## lm(formula = Y ~ Block * (Storage + Shipping), data = table_2018s2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2704 -2.8865 -0.0842  2.2082  8.9433 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               68.476      1.735  39.478  < 2e-16 ***
## BlockBlk2                 -4.170      2.453  -1.700  0.09436 .  
## BlockBlk3                  5.370      2.453   2.189  0.03248 *  
## StorageStor2             -19.573      1.735 -11.284  < 2e-16 ***
## ShippingShip2              3.776      2.124   1.778  0.08054 .  
## ShippingShip3              9.217      2.124   4.339 5.58e-05 ***
## BlockBlk2:StorageStor2    39.227      2.453  15.991  < 2e-16 ***
## BlockBlk3:StorageStor2    38.242      2.453  15.589  < 2e-16 ***
## BlockBlk2:ShippingShip2   -7.786      3.004  -2.592  0.01198 *  
## BlockBlk3:ShippingShip2   -9.112      3.004  -3.033  0.00357 ** 
## BlockBlk2:ShippingShip3  -17.272      3.004  -5.749 3.21e-07 ***
## BlockBlk3:ShippingShip3  -21.206      3.004  -7.059 1.99e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.249 on 60 degrees of freedom
## Multiple R-squared:  0.9054, Adjusted R-squared:  0.8881 
## F-statistic: 52.23 on 11 and 60 DF,  p-value: < 2.2e-16
## Analysis of Variance Table
## 
## Response: Y
##                Df Sum Sq Mean Sq  F value    Pr(>F)    
## Block           2 2483.3 1241.65  68.7810 2.975e-16 ***
## Storage         1  703.2  703.19  38.9529 4.841e-08 ***
## Shipping        2  156.3   78.16   4.3297   0.01752 *  
## Block:Storage   2 6004.3 3002.13 166.3021 < 2.2e-16 ***
## Block:Shipping  4 1024.0  256.00  14.1809 3.329e-08 ***
## Residuals      60 1083.1   18.05                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

library(emmeans)
Blk_Stor <- pairs(lsmeans(model_2018s2,~Block|Storage))
Stor_Blk <- pairs(lsmeans(model_2018s2,~Storage|Block))

Blk_Ship <- pairs(lsmeans(model_2018s2,~Block|Shipping))
Ship_Blk <- pairs(lsmeans(model_2018s2,~Shipping|Block))

Stor_Ship <- pairs(lsmeans(model_2018s2,~Storage|Shipping))
Ship_Stor <- pairs(lsmeans(model_2018s2,~Shipping|Storage))

library(kableExtra)
kable(test(rbind(Blk_Stor,Stor_Blk),adjust="tukey"),format="latex")%>%kable_styling("condensed",full_width=F,font_size = 8)%>%row_spec(c(2),bold =T)%>%row_spec(c(2),background ="#EAFAF1")
kable(test(rbind(Blk_Ship,Ship_Blk),adjust="tukey"),format="latex")%>%kable_styling("condensed",full_width=F,font_size = 8)%>%row_spec(c(7:9,11,14,17,18),bold =T)%>%row_spec(c(7:9,11,14,17,18),background ="#EAFAF1")
kable(test(rbind(Stor_Ship,Ship_Stor),adjust="tukey"),format="latex")%>%kable_styling("condensed",full_width=F,font_size = 8)%>%row_spec(c(1:3,5,8),bold =T)%>%row_spec(c(1:3,5,8),background ="#EAFAF1")

2018F

Robert Fountain*, Daniel Taylor-Rodriguez

2018F1

2015F1 [2017S1]

Find the best model for predicting Y based on X1 and X2. Y is the amount of profit that a company makes in a month. X1 is the number of months that the company has been in business. X2 is the amount spent on advertising. Consider as predictors all possible linear and quadratic terms (\(X1, X1^2, X2, X2^2\), and \(X1X2\)). Consider possible transformations of Y. Include all appropriate diagnostics. When you have found your “best” model, predict a new Y when X1 = 20 and X2 = $1,900, giving a 95% prediction interval. The data set, shown below, appears in “Profits.xlsx”.

## Classes 'tbl_df', 'tbl' and 'data.frame':    25 obs. of  3 variables:
##  $ X1: num  1 2 3 4 5 6 7 8 9 10 ...
##  $ X2: num  1928 1366 1402 1325 1561 ...
##  $ Y : num  16624 17082 16486 14435 17922 ...

model_2018f1 <- lm(Y~X1/X2, table_2018f1)
summary(model_2018f1)
## 
## Call:
## lm(formula = Y ~ X1/X2, data = table_2018f1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2404.22  -904.25    51.59   918.49  1821.40 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15080.0896   517.9820  29.113  < 2e-16 ***
## X1            647.7910    79.5268   8.146 4.37e-08 ***
## X1:X2          -0.1570     0.0322  -4.875 7.13e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1241 on 22 degrees of freedom
## Multiple R-squared:  0.818,  Adjusted R-squared:  0.8014 
## F-statistic: 49.43 on 2 and 22 DF,  p-value: 7.266e-09
library(olsrr)
ols_regress(model_2018f1)
##                            Model Summary                            
## -------------------------------------------------------------------
## R                       0.904       RMSE                  1240.942 
## R-Squared               0.818       Coef. Var                6.413 
## Adj. R-Squared          0.801       MSE                1539937.532 
## Pred R-Squared          0.754       MAE                    990.759 
## -------------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                   ANOVA                                    
## --------------------------------------------------------------------------
##                      Sum of                                               
##                     Squares        DF     Mean Square      F         Sig. 
## --------------------------------------------------------------------------
## Regression    152247073.255         2    76123536.628    49.433    0.0000 
## Residual       33878625.705        22     1539937.532                     
## Total         186125698.960        24                                     
## --------------------------------------------------------------------------
## 
##                                        Parameter Estimates                                        
## -------------------------------------------------------------------------------------------------
##       model         Beta    Std. Error    Std. Beta      t        Sig         lower        upper 
## -------------------------------------------------------------------------------------------------
## (Intercept)    15080.090       517.982                 29.113    0.000    14005.861    16154.318 
##          X1      647.791        79.527        1.712     8.146    0.000      482.863      812.720 
##       X1:X2       -0.157         0.032       -1.025    -4.875    0.000       -0.224       -0.090 
## -------------------------------------------------------------------------------------------------
car::Anova(model_2018f1)
## Anova Table (Type II tests)
## 
## Response: Y
##              Sum Sq Df F value    Pr(>F)    
## X1        115643157  1  75.096 1.527e-08 ***
## X1:X2      36603916  1  23.770 7.127e-05 ***
## Residuals  33878626 22                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
car::vif(model_2018f1)
##       X1    X1:X2 
## 5.339091 5.339091
anova(model_2018f1)
## Analysis of Variance Table
## 
## Response: Y
##           Df    Sum Sq   Mean Sq F value    Pr(>F)    
## X1         1 115643157 115643157  75.096 1.527e-08 ***
## X1:X2      1  36603916  36603916  23.770 7.127e-05 ***
## Residuals 22  33878626   1539938                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model_2018f1)

predict(model_2018f1, newdata = data.frame(X1=20,X2=1900), interval = "prediction", level=0.95)
##        fit      lwr      upr
## 1 22070.41 19385.27 24755.55

2018F2

2015F2 [7.4] [8.E.10]

A replicated fractional factorial design is used to investigate the effect of four factors on the free height of leaf springs used in an automotive application. The factors are (A) furnace temperature, (B) heating time, (C) transfer time, and (D) hold down time. There are 3 observations at each setting.

Write out the alias structure for this design. What is the resolution of this design?

I=ABCD, AB=CD, AC=BD, BC=AD; A=BCD, B=ACD, C=ABD, D=ABC; III

Analyze the data. What factors influence the mean free height? The data set appears in the file “Springs.xlsx”.

A, B

table_2018f2 <- readxl::read_xlsx("qe_lab/Springs_2018f.xlsx")
table_2018f2 <- table_2018f2[order(table_2018f2$D ,table_2018f2$C ,table_2018f2$B,table_2018f2$A),]
str(table_2018f2)
## Classes 'tbl_df', 'tbl' and 'data.frame':    48 obs. of  5 variables:
##  $ A      : num  -1 -1 -1 -1 -1 -1 1 1 1 1 ...
##  $ B      : num  -1 -1 -1 -1 -1 -1 1 1 1 1 ...
##  $ C      : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ D      : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ Heights: num  8.56 8 8.56 7.5 8.62 7.24 8.18 8.26 8.12 8.5 ...
kableExtra::kable(table_2018f2)
A B C D Heights
-1 -1 -1 -1 8.56
-1 -1 -1 -1 8.00
-1 -1 -1 -1 8.56
-1 -1 -1 -1 7.50
-1 -1 -1 -1 8.62
-1 -1 -1 -1 7.24
1 1 -1 -1 8.18
1 1 -1 -1 8.26
1 1 -1 -1 8.12
1 1 -1 -1 8.50
1 1 -1 -1 8.50
1 1 -1 -1 8.12
1 -1 1 -1 8.38
1 -1 1 -1 8.12
1 -1 1 -1 9.18
1 -1 1 -1 8.38
1 -1 1 -1 9.12
1 -1 1 -1 8.24
-1 1 1 -1 8.12
-1 1 1 -1 7.36
-1 1 1 -1 8.04
-1 1 1 -1 7.36
-1 1 1 -1 7.88
-1 1 1 -1 7.50
1 -1 -1 1 9.30
1 -1 -1 1 8.76
1 -1 -1 1 9.36
1 -1 -1 1 8.76
1 -1 -1 1 8.76
1 -1 -1 1 7.88
-1 1 -1 1 8.00
-1 1 -1 1 8.00
-1 1 -1 1 8.12
-1 1 -1 1 8.12
-1 1 -1 1 8.00
-1 1 -1 1 8.00
-1 -1 1 1 8.08
-1 -1 1 1 7.64
-1 -1 1 1 9.00
-1 -1 1 1 7.88
-1 -1 1 1 8.76
-1 -1 1 1 7.88
1 1 1 1 8.12
1 1 1 1 8.62
1 1 1 1 8.62
1 1 1 1 8.00
1 1 1 1 8.38
1 1 1 1 8.18
model_2018f2 <- aov(Heights~A*B*C*D, table_2018f2)
model_2018f2$coefficients
## (Intercept)           A           B           C           D         A:B 
##  8.25125000  0.24208333 -0.16375000 -0.04958333  0.09125000 -0.02958333 
##         A:C         B:C         A:D         B:D         C:D       A:B:C 
##  0.00125000 -0.02291667          NA          NA          NA          NA 
##       A:B:D       A:C:D       B:C:D     A:B:C:D 
##          NA          NA          NA          NA
library(daewr)
## Registered S3 method overwritten by 'partitions':
##   method            from
##   print.equivalence lava
## Registered S3 method overwritten by 'DoE.base':
##   method           from       
##   factorize.factor conf.design
## 
## Attaching package: 'daewr'
## The following object is masked from 'package:olsrr':
## 
##     cement
## The following object is masked from 'package:lme4':
## 
##     cake
## The following object is masked from 'package:magrittr':
## 
##     mod
halfnorm(model_2018f2$coefficients[2:8],alpha=1)

## zscore= 0.08964235 0.27188 0.4637078 0.6744898 0.920823 1.241867 1.802743effp= 0.00125 0.02291667 0.02958333 0.04958333 0.09125 0.16375 0.2420833
summary(model_2018f2)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## A            1  2.813  2.8130  16.359 0.000232 ***
## B            1  1.287  1.2871   7.485 0.009232 ** 
## C            1  0.118  0.1180   0.686 0.412346    
## D            1  0.400  0.3997   2.324 0.135232    
## A:B          1  0.042  0.0420   0.244 0.623819    
## A:C          1  0.000  0.0001   0.000 0.983441    
## B:C          1  0.025  0.0252   0.147 0.703832    
## Residuals   40  6.878  0.1720                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_2018f2_2 <- aov(Heights~A+B, table_2018f2)
summary(model_2018f2_2)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## A            1  2.813  2.8130  16.962 0.000161 ***
## B            1  1.287  1.2871   7.761 0.007788 ** 
## Residuals   45  7.463  0.1658                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(gghalfnorm)
gghalfnorm(x = table_2018f2$Heights, nlab = 3)+ ggplot2::theme_light()

2019S

Robert Fountain*, Daniel Taylor-Rodriguez

Instructions:

  1. Two 8.5’’ x 11’’ pages of notes (front and back) are allowed.

  2. Perform the statistical analysis in your software of preference for the two problems below. The data sets for each problem are on the flash drive provided. Create a word or pdf document with your findings. Save the document to the flash drive provided with your name as the file name. You may use scratch paper during the exam, but everything you want considered for grading must be included in your document. Additionally, you must copy and paste the code used for the analysis at the end of the word/pdf document you submit.

  3. For each question discuss all relevant aspects of your analysis (exploratory and modeling) supporting them with graphical and numerical summaries that are important for communicating results. It should also include a discussion of diagnostics and model adequacy, and rationale for any transformations or other key modeling decisions. The report should include interpretations of the output, written so that a statistically literate person can understand and apply the findings in each case.

2019S1

[4.2.1 PRESS residuals]

The goal of this exercise is to find the best model for predicting (out-of-sample) Y based on the continuous variables x1, x2, x3, and on the binary variables A and B. The data set is in the dataset “ModelBuildingData.xlsx”. Consider possible transformations of Y, and for the linear predictor consider 2-way interactions and quadratic terms. Include all appropriate diagnostics, and make any necessary adjustments to the data so model assumptions are met.

Use only the first 250 observations for model training model (i.e., selection, fitting and diagnostics). With your top model, obtain predictions for all 250 remaining observations (the hold-out samples), and their corresponding 95% predictive intervals. Finally, calculate and interpret (in term of the model predictive ability) the Prediction Root Mean Square Error (PRMSE), as follows:

table_2019s1 <- readxl::read_xlsx("qe_lab/ModelBuildingData.xlsx")
str(table_2019s1)
## Classes 'tbl_df', 'tbl' and 'data.frame':    500 obs. of  6 variables:
##  $ y : num  0.858 1.07 0.782 1.195 1.065 ...
##  $ x1: num  -0.469 0.679 -1.802 -0.386 0.576 ...
##  $ x2: num  -1.903 -0.615 0.64 -1.82 -0.401 ...
##  $ x3: num  0.595 -1.845 -0.521 0.55 -1.775 ...
##  $ A : chr  "a1" "a1" "a1" "a1" ...
##  $ B : chr  "b1" "b1" "b1" "b2" ...
dplyr::glimpse(table_2019s1)
## Observations: 500
## Variables: 6
## $ y  <dbl> 0.8575377, 1.0700376, 0.7816973, 1.1954874, 1.0648021, 0.7508…
## $ x1 <dbl> -0.4686792, 0.6794381, -1.8021745, -0.3855567, 0.5755118, -1.…
## $ x2 <dbl> -1.9030527, -0.6147600, 0.6401392, -1.8199357, -0.4007813, 0.…
## $ x3 <dbl> 0.5945833, -1.8454583, -0.5208516, 0.5502561, -1.7749336, -0.…
## $ A  <chr> "a1", "a1", "a1", "a1", "a1", "a1", "a1", "a1", "a2", "a1", "…
## $ B  <chr> "b1", "b1", "b1", "b2", "b1", "b1", "b2", "b2", "b2", "b2", "…
table_2019s1_250 <- table_2019s1[1:250,]
table_2019s1_500 <- table_2019s1[251:500,]
str(table_2019s1_250)
## Classes 'tbl_df', 'tbl' and 'data.frame':    250 obs. of  6 variables:
##  $ y : num  0.858 1.07 0.782 1.195 1.065 ...
##  $ x1: num  -0.469 0.679 -1.802 -0.386 0.576 ...
##  $ x2: num  -1.903 -0.615 0.64 -1.82 -0.401 ...
##  $ x3: num  0.595 -1.845 -0.521 0.55 -1.775 ...
##  $ A : chr  "a1" "a1" "a1" "a1" ...
##  $ B : chr  "b1" "b1" "b1" "b2" ...
str(table_2019s1_500)
## Classes 'tbl_df', 'tbl' and 'data.frame':    250 obs. of  6 variables:
##  $ y : num  0.86 0.91 0.998 0.979 0.803 ...
##  $ x1: num  0.635 -1.764 -0.472 0.585 -1.755 ...
##  $ x2: num  -0.588 0.552 -1.822 -0.352 0.599 ...
##  $ x3: num  -1.809 -0.377 0.578 -1.861 -0.602 ...
##  $ A : chr  "a1" "a1" "a1" "a2" ...
##  $ B : chr  "b1" "b2" "b1" "b1" ...
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## 
## Call:
## lm(formula = y ~ x1 + x2 + x3 + A + B, data = table_2019s1_250)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37764 -0.12857 -0.01063  0.08906  1.85501 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.39962    0.15936   8.783 2.87e-16 ***
## x1           0.28627    0.09490   3.017  0.00283 ** 
## x2           0.22936    0.09579   2.394  0.01740 *  
## x3           0.27945    0.09504   2.940  0.00359 ** 
## Aa2         -0.13200    0.02971  -4.443 1.35e-05 ***
## Bb2          0.04808    0.02956   1.627  0.10513    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2125 on 244 degrees of freedom
## Multiple R-squared:  0.1533, Adjusted R-squared:  0.1359 
## F-statistic: 8.835 on 5 and 244 DF,  p-value: 1.007e-07
## Anova Table (Type II tests)
## 
## Response: y
##            Sum Sq  Df F value    Pr(>F)    
## x1         0.4110   1  9.0998  0.002827 ** 
## x2         0.2590   1  5.7330  0.017405 *  
## x3         0.3905   1  8.6459  0.003593 ** 
## A          0.8916   1 19.7399 1.347e-05 ***
## B          0.1195   1  2.6456  0.105129    
## Residuals 11.0213 244                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##        x1        x2        x3         A         B 
## 48.920081 50.351228 49.831957  1.033647  1.023466

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

library(olsrr)
# ols_plot_diagnostics(model_2019s1_1)
ols_step_both_aic(model_2019s1_1)

ols_step_both_p(lm(table_2019s1_250,formula=log(y)~ x1+x2+x3+A+B))
## 
## Call:
## lm(formula = log(y) ~ x2 + A + B, data = table_2019s1_250)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58842 -0.13906  0.00879  0.11250  1.15248 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.10330    0.01704  -6.064 4.97e-09 ***
## x2          -0.06149    0.01214  -5.066 7.96e-07 ***
## Aa2         -0.12183    0.02614  -4.661 5.15e-06 ***
## Bb2          0.05454    0.02624   2.079   0.0387 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1893 on 246 degrees of freedom
## Multiple R-squared:  0.1695, Adjusted R-squared:  0.1594 
## F-statistic: 16.74 on 3 and 246 DF,  p-value: 6.312e-10
## Anova Table (Type II tests)
## 
## Response: log(y)
##           Sum Sq  Df F value    Pr(>F)    
## x2        0.9202   1 25.6682 7.965e-07 ***
## A         0.7789   1 21.7278 5.154e-06 ***
## B         0.1549   1  4.3212   0.03868 *  
## Residuals 8.8191 246                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##       x2        A        B 
## 1.018554 1.007957 1.015740
##                  48.8 %      51.2 %
## (Intercept) -0.10383914 -0.10277031
## x2          -0.06187532 -0.06111380
## Aa2         -0.12265219 -0.12101237
## Bb2          0.05371814  0.05536428
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Analysis of Variance Table
## 
## Response: log(y)
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## x2          1 0.8964 0.89643 25.0051 1.088e-06 ***
## A           1 0.7486 0.74864 20.8825 7.733e-06 ***
## B           1 0.1549 0.15491  4.3212   0.03868 *  
## Residuals 246 8.8191 0.03585                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model_2019s1_3 <- lm(table_2019s1_500,formula=log(y)~ x2+A+B)
summary(model_2019s1_3)
## 
## Call:
## lm(formula = log(y) ~ x2 + A + B, data = table_2019s1_500)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34814 -0.10156 -0.00912  0.09695  0.32199 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.137812   0.012796 -10.770  < 2e-16 ***
## x2          -0.074053   0.008689  -8.523 1.59e-15 ***
## Aa2         -0.098649   0.019104  -5.164 5.00e-07 ***
## Bb2          0.055665   0.018112   3.073  0.00235 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1357 on 246 degrees of freedom
## Multiple R-squared:  0.3167, Adjusted R-squared:  0.3084 
## F-statistic: 38.01 on 3 and 246 DF,  p-value: < 2.2e-16
ols_regress(log(y)~ x2+A+B, data = table_2019s1_500)
##                          Model Summary                           
## ----------------------------------------------------------------
## R                       0.563       RMSE                  0.136 
## R-Squared               0.317       Coef. Var          -129.304 
## Adj. R-Squared          0.308       MSE                   0.018 
## Pred R-Squared          0.294       MAE                   0.111 
## ----------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                 
## --------------------------------------------------------------------
##                Sum of                                               
##               Squares         DF    Mean Square      F         Sig. 
## --------------------------------------------------------------------
## Regression      2.099          3          0.700    38.007    0.0000 
## Residual        4.528        246          0.018                     
## Total           6.627        249                                    
## --------------------------------------------------------------------
## 
##                                    Parameter Estimates                                    
## -----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta       t        Sig      lower     upper 
## -----------------------------------------------------------------------------------------
## (Intercept)    -0.138         0.013                 -10.770    0.000    -0.163    -0.113 
##          x2    -0.074         0.009       -0.452     -8.523    0.000    -0.091    -0.057 
##         Aa2    -0.099         0.019       -0.273     -5.164    0.000    -0.136    -0.061 
##         Bb2     0.056         0.018        0.163      3.073    0.002     0.020     0.091 
## -----------------------------------------------------------------------------------------
library(Metrics)
Metrics::rmse(table_2019s1_500$y,exp(predict(model_2019s1_2,table_2019s1_500)))
## [1] 0.1285634
ols_press(model_2019s1_3)
## [1] 4.681989
MPV::PRESS(model_2019s1_3)
## [1] 4.681989
sum((residuals(model_2019s1_3)/(1 - lm.influence(model_2019s1_3)$hat))^2)
## [1] 4.681989
ols_pred_rsq(model_2019s1_3)
## [1] 0.2935096
# str(model_2019s1_3)
# From 564-lab caculate prediction power
deviation <- table_2019s1_500$y-mean(table_2019s1_500$y)
SST <- deviation%*%deviation
1-(MPV::PRESS(model_2019s1_3)/SST)
##           [,1]
## [1,] 0.2378794
# by definition PRESS
sum((table_2019s1_500$y-exp(model_2019s1_2$fit))^2)
## [1] 8.358063
sum((table_2019s1_500$y-exp(predict(model_2019s1_2,table_2019s1_500)))^2)
## [1] 4.13214
# one method of RMSE
sqrt(mean(model_2019s1_3$residuals^2))
## [1] 0.1345847
# remove outlier
table_2019s1_250[c(189,219,249),]
table_2019s1_250_noouter <- table_2019s1_250[-c(189,219,249), ]
table_2019s1_250_noouter <- table_2019s1_250[-c(113,189,219,249), ]
model_2019s1_noouter <- lm(y ~ sqrt(!is.na(x1))+x2+x3+A+B, data = table_2019s1_250_noouter)
summary(model_2019s1_noouter)
plot(model_2019s1_noouter)
  1. calculate for each observation the square of the prediction errors,

  2. obtain the square root of the average of all squared prediction errors.

https://blog.minitab.com/blog/adventures-in-statistics-2/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables

2019S2

[14.4] [566-fe-4] [Example 8.4]

An experiment was conducted to compare 4 wool fiber treatments (Trt) at 7 dry cycle revolutions (Rev) over 4 experimental runs (Run) (i.e., the blocks). The outcome measured from this experiment was the top shrinkage (Shrink) of the fiber. A restriction on the randomization: within each experimental run (blocks), wool fiber treatments were randomized to whole plots, and within each whole plot, measurements were obtained for all of 7 dry cycle revolutions (split plot treatments). In other words, the experiment was set as a split-plot design with:

  1. whole plot (wool fiber treatment) treatments: untreated, alcoholic potash 15 Sec, alcoholic potash 4Min, and alcoholic potash 15Min;

  2. subplot treatments: dry cycle revolutions (200 to 1400 by 200); and

  3. blocks: 4 experimental runs (possibly different days).

Do a full analysis and report your findings for the experiment above (data in “Wool-Shrink.xlsx”), using a split plot design where both Trt and Rev are treated as categorical variables.

table_2019s2 <- readxl::read_xlsx("~/qushen26/stat2019_website/static/stat566/qe_lab/WoolShrink.xlsx")
str(table_2019s2)
## Classes 'tbl_df', 'tbl' and 'data.frame':    112 obs. of  4 variables:
##  $ Run   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Trt   : num  1 1 1 1 1 1 1 2 2 2 ...
##  $ Rev   : num  200 400 600 800 1000 1200 1400 200 400 600 ...
##  $ Shrink: num  8 18.5 29 34.3 37.5 40.2 43.2 10.8 13.2 21 ...
library(dplyr)
dplyr::glimpse(table_2019s2)
## Observations: 112
## Variables: 4
## $ Run    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Trt    <dbl> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, …
## $ Rev    <dbl> 200, 400, 600, 800, 1000, 1200, 1400, 200, 400, 600, 800,…
## $ Shrink <dbl> 8.0, 18.5, 29.0, 34.3, 37.5, 40.2, 43.2, 10.8, 13.2, 21.0…
table_2019s2$Run <- factor(table_2019s2$Run,labels=c("Day1","Day2","Day3","Day4"))
table_2019s2$Trt <- factor(table_2019s2$Trt,labels=c("untreated","15 Sec","4Min","15Min"))
table_2019s2$Rev <- as.factor(table_2019s2$Rev)
str(table_2019s2)
## Classes 'tbl_df', 'tbl' and 'data.frame':    112 obs. of  4 variables:
##  $ Run   : Factor w/ 4 levels "Day1","Day2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Trt   : Factor w/ 4 levels "untreated","15 Sec",..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ Rev   : Factor w/ 7 levels "200","400","600",..: 1 2 3 4 5 6 7 1 2 3 ...
##  $ Shrink: num  8 18.5 29 34.3 37.5 40.2 43.2 10.8 13.2 21 ...
dplyr::glimpse(table_2019s2)
## Observations: 112
## Variables: 4
## $ Run    <fct> Day1, Day1, Day1, Day1, Day1, Day1, Day1, Day1, Day1, Day…
## $ Trt    <fct> untreated, untreated, untreated, untreated, untreated, un…
## $ Rev    <fct> 200, 400, 600, 800, 1000, 1200, 1400, 200, 400, 600, 800,…
## $ Shrink <dbl> 8.0, 18.5, 29.0, 34.3, 37.5, 40.2, 43.2, 10.8, 13.2, 21.0…

The above plots show that: There is not much difference in the average shrink from different days. The average shrink are lower when the treatment is longer. The average shrink are higher when the revolutions are faster.

The tables show the same thing with the numerical summaries for each factor level and their combinations.

library(GAD)
table_2019s2$Run_r <- as.random(table_2019s2$Run)
table_2019s2$Trt_f <- as.fixed(table_2019s2$Trt)
table_2019s2$Rev_f <- as.fixed(table_2019s2$Rev)
model_2019s2_1 <- aov(formula = Shrink ~ Run_r+Trt_f + Trt_f%in%Run_r+ Rev_f%in%Run_r + Rev_f + Trt_f:Rev_f, data=table_2019s2)
pander::pander(gad(model_2019s2_1))
Analysis of Variance Table
  Df Sum Sq Mean Sq F value Pr(>F)
Run_r 3 124.3 41.43 36.47 5.099e-13
Trt_f 3 3013 1004 78.84 8.81e-07
Rev_f 6 11052 1842 876.8 3.405e-21
Run_r:Trt_f 9 114.6 12.74 11.21 1.218e-09
Run_r:Rev_f 18 37.81 2.101 1.849 0.04245
Trt_f:Rev_f 18 269.5 14.97 13.18 8.477e-14
Residual 54 61.35 1.136 NA NA

The results show all the main effects and the interaction effect of Runs and Recolutions are significant at 0.05 significance level (P-value=0.5082).

library("lme4")
model_2019s2_2 <- lmer(formula = Shrink ~ (1|Run) + Trt + (1|Run:Trt) + Rev + (1|Run:Rev) + Trt:Rev, data=table_2019s2 , REML = TRUE)
summary(model_2019s2_2)$varcor
##  Groups   Name        Std.Dev.
##  Run:Rev  (Intercept) 0.49104 
##  Run:Trt  (Intercept) 1.28736 
##  Run      (Intercept) 0.99516 
##  Residual             1.06587
pander::pander(confint(model_2019s2_2)[1:4,1:2])

Computing profile confidence intervals …

  2.5 % 97.5 %
.sig01 0 0.726
.sig02 0.7415 1.82
.sig03 0 2.512
.sigma 0.7906 1.097

The results of variance components show the variance of interaction term of Runs and revolutions is negligible and hence dropping interaction term of them.

model_2019s2_3 <- aov(formula = Shrink ~ Run_r+Trt_f + Trt_f%in%Run_r+ Rev_f + Trt_f:Rev_f, data=table_2019s2)
pander::pander(gad(model_2019s2_3))
Analysis of Variance Table
  Df Sum Sq Mean Sq F value Pr(>F)
Run_r 3 124.3 41.43 30.08 1.024e-12
Trt_f 3 3013 1004 78.84 8.81e-07
Rev_f 6 11052 1842 1337 1.01e-71
Run_r:Trt_f 9 114.6 12.74 9.249 3.546e-09
Trt_f:Rev_f 18 269.5 14.97 10.87 4.62e-14
Residual 72 99.16 1.377 NA NA
model_2019s2_4<- lmer(formula = Shrink ~ (1|Run)+Trt+Rev+(1|Run:Trt)+Rev*Trt, data=table_2019s2, REML = TRUE)

The ANOVA table of new model shows that the interaction effects are significant. This means that the effects of day v.s.revolutions and treatment v.s.revolutions on the shrink are not independent. Hence, the simple effects must be tested.

When the day2, the mean shrinks between the 15-Sec and 4-Min treatment don’t have significant difference. For all the rest of days, the mean shrinks are significantly different between any different treatment.

The changes of days for a given treatment don’t give consistent results.

For untreated cases, the mean shrinks are not significantly different between 1200 and 1400 revolutions. For all the rest of treatments, the mean shrinks are significantly different between any different revolutions.

For a given revolution, 15-Sec and 4-Min treatment don’t have significant differece on the mean shrinks.

  • Conclusion

Choosing a higher revolution for a given treatment can get a larger shrink.

In most of the cases, longter alcoholic potash have less shrink. This effect will be more significant when higher revolution.

  • Model Adequacy Checking

In the plots of residuals versus predicted value of shrink, there is no significant pattern on this plot. Therefore, the fitted model is good enough to describe the relationship between the mean value of shrink and the days, revolutions, and treatment.

The residuals in this plot are almost symmetrically distributed about zero and hence zero mean assumption is not violated. Further, the vertical deviation of the residuals from zero is about same for each predicted value and hence the constant variance assumption is not violated.

The points are along the straight line in the normal qq plot shown at bottom left and the histogram of residuals shown at the top right is about normal. These plots show no violation of normal distribution assumption of residuals.